Knowledge Base

`KnowledgeBase`

Bases: SynalinksSaveable

A knowledge base for storing and retrieving structured data.

The KnowledgeBase provides a unified interface over two complementary stores: a SQL row/table store (DuckDB by default) and a property-graph store (LadybugDB by default). The two are orthogonal — SQL methods (update, sql, similarity_search, ...) route to the SQL adapter; graph methods (update_entities, cypher, entity_similarity_search, ...) route to the graph adapter.

A no-args KnowledgeBase() instantiates BOTH stores under synalinks_home() (database.db for SQL, database.lb for the graph) so the two sides are usable side-by-side without setup. Pass uri= alone for SQL-only, graph_uri= alone for graph-only, or both to point each side at a custom location.

Basic Usage

import synalinks

class Document(synalinks.DataModel):
    id: str
    title: str
    content: str

# Create a knowledge base without embeddings (full-text search only)
knowledge_base = synalinks.KnowledgeBase(
    uri="duckdb://my_database.db",
    data_models=[Document],
)

# Store a document
doc = Document(id="1", title="Hello", content="Hello World!")
await knowledge_base.update(doc.to_json_data_model())

# Retrieve by ID (the first field, here 'id', is the primary key — see
# the "Primary Key Convention" section below).
result = await knowledge_base.get("1", table_name="Document")

# Full-text search
results = await knowledge_base.fulltext_search("Hello", k=10)

Primary Key Convention

Synalinks does not inject a synthetic uuid / _id column. The primary key is the first declared field of your DataModel, in declaration order, after skipping reserved structural fields:

For SQL tables (DuckDB): the first property of the schema.
For graph entities (Ladybug nodes): the first property after label. label is the node-table name, not a column.
For graph relations (Ladybug edges): the first property after subj / label / obj. Those three are reserved — the endpoints are resolved against the node tables, and the label is the edge-table name.

Because the PK is just "whichever field you declared first", a KnowledgeBase can be pointed at a pre-existing DuckDB file or LadybugDB store without rewriting rows or renaming columns: declare your DataModel so its first field matches the column you already treat as the identifier (id, ticker, isbn, email, whatever it happens to be) and the adapters will use it. If you want a UUID-style key, declare it explicitly as the first field and populate it yourself — generating identifiers is the caller's job, not the framework's.

With Vector Similarity Search

embedding_model = synalinks.EmbeddingModel(
    model="ollama/mxbai-embed-large"
)

knowledge_base = synalinks.KnowledgeBase(
    uri="duckdb://./my_database.db",
    data_models=[Document],
    embedding_model=embedding_model,
    metric="cosine",
)

# Hybrid search (combines BM25 fulltext + vector similarity, fused by RRF)
results = await knowledge_base.hybrid_fts_search("semantic query", k=10)

Retrieving Table Definitions

# Get all symbolic data models (table definitions) from the database
symbolic_models = knowledge_base.get_symbolic_data_models()

for model in symbolic_models:
    print(model.get_schema())
    # {'title': 'Document', 'type': 'object', 'properties': {...}, ...}

Parameters:

Name	Type	Description	Default
`uri`	`str`	SQL store connection URI (`"duckdb://path/to/db.db"`). When both `uri` and `graph_uri` are omitted, defaults to `{synalinks_home()}/{name or 'database'}.db`. Pass `uri` alone to opt out of the graph-side default.	`None`
`graph_uri`	`str`	Graph store connection URI (`"ladybug://path/to/graph.lb"` or `"ladybug://:memory:"`). When both URIs are omitted, defaults to `{synalinks_home()}/{name or 'database'}.lb`. Pass `graph_uri` alone to opt out of the SQL-side default.	`None`
`data_models`	`list`	Optional list of DataModel or SymbolicDataModel classes to create tables for in the SQL store.	`None`
`entity_models`	`list`	Optional list of entity (node) models for the graph store.	`None`
`relation_models`	`list`	Optional list of relation (edge) models for the graph store.	`None`
`embedding_model`	`EmbeddingModel`	Optional embedding model for vector similarity search; forwarded to both stores.	`None`
`metric`	`str`	The distance metric for vector search. Options: "cosine", "l2sq", "ip" (default: "cosine").	`'cosine'`
`wipe_on_start`	`bool`	Whether to clear the database on initialization (default: False).	`False`
`name`	`str`	Optional name for the knowledge base (used for serialization and as the filename stem for the default `.synalinks` paths).	`None`
`encryption_key`	`str`	Optional at-rest encryption key for the SQL store. Not forwarded to the graph store (LadybugDB has no encryption-at-rest support).	`None`

Source code in synalinks/src/knowledge_bases/knowledge_base.py

@synalinks_export("synalinks.KnowledgeBase")
class KnowledgeBase(SynalinksSaveable):
    """A knowledge base for storing and retrieving structured data.

    The KnowledgeBase provides a unified interface over two complementary
    stores: a SQL row/table store (DuckDB by default) and a property-graph
    store (LadybugDB by default). The two are orthogonal — SQL methods
    (``update``, ``sql``, ``similarity_search``, ...) route to the SQL
    adapter; graph methods (``update_entities``, ``cypher``,
    ``entity_similarity_search``, ...) route to the graph adapter.

    A no-args ``KnowledgeBase()`` instantiates BOTH stores under
    ``synalinks_home()`` (``database.db`` for SQL, ``database.lb`` for
    the graph) so the two sides are usable side-by-side without setup.
    Pass ``uri=`` alone for SQL-only, ``graph_uri=`` alone for
    graph-only, or both to point each side at a custom location.

    ### Basic Usage

    ```python
    import synalinks

    class Document(synalinks.DataModel):
        id: str
        title: str
        content: str

    # Create a knowledge base without embeddings (full-text search only)
    knowledge_base = synalinks.KnowledgeBase(
        uri="duckdb://my_database.db",
        data_models=[Document],
    )

    # Store a document
    doc = Document(id="1", title="Hello", content="Hello World!")
    await knowledge_base.update(doc.to_json_data_model())

    # Retrieve by ID (the first field, here 'id', is the primary key — see
    # the "Primary Key Convention" section below).
    result = await knowledge_base.get("1", table_name="Document")

    # Full-text search
    results = await knowledge_base.fulltext_search("Hello", k=10)
    ```

    ### Primary Key Convention

    Synalinks does not inject a synthetic ``uuid`` / ``_id`` column. The
    primary key is the **first declared field** of your DataModel, in
    declaration order, after skipping reserved structural fields:

    * For SQL tables (DuckDB): the first property of the schema.
    * For graph entities (Ladybug nodes): the first property after
      ``label``. ``label`` is the node-table name, not a column.
    * For graph relations (Ladybug edges): the first property after
      ``subj`` / ``label`` / ``obj``. Those three are reserved — the
      endpoints are resolved against the node tables, and the label is
      the edge-table name.

    Because the PK is just "whichever field you declared first", a
    KnowledgeBase can be pointed at a pre-existing DuckDB file or
    LadybugDB store without rewriting rows or renaming columns: declare
    your DataModel so its first field matches the column you already
    treat as the identifier (``id``, ``ticker``, ``isbn``, ``email``,
    whatever it happens to be) and the adapters will use it. If you
    *want* a UUID-style key, declare it explicitly as the first field
    and populate it yourself — generating identifiers is the caller's
    job, not the framework's.

    ### With Vector Similarity Search

    ```python
    embedding_model = synalinks.EmbeddingModel(
        model="ollama/mxbai-embed-large"
    )

    knowledge_base = synalinks.KnowledgeBase(
        uri="duckdb://./my_database.db",
        data_models=[Document],
        embedding_model=embedding_model,
        metric="cosine",
    )

    # Hybrid search (combines BM25 fulltext + vector similarity, fused by RRF)
    results = await knowledge_base.hybrid_fts_search("semantic query", k=10)
    ```

    ### Retrieving Table Definitions

    ```python
    # Get all symbolic data models (table definitions) from the database
    symbolic_models = knowledge_base.get_symbolic_data_models()

    for model in symbolic_models:
        print(model.get_schema())
        # {'title': 'Document', 'type': 'object', 'properties': {...}, ...}
    ```

    Args:
        uri (str): SQL store connection URI (``"duckdb://path/to/db.db"``).
            When both ``uri`` and ``graph_uri`` are omitted, defaults to
            ``{synalinks_home()}/{name or 'database'}.db``. Pass ``uri``
            alone to opt out of the graph-side default.
        graph_uri (str): Graph store connection URI
            (``"ladybug://path/to/graph.lb"`` or
            ``"ladybug://:memory:"``). When both URIs are omitted,
            defaults to ``{synalinks_home()}/{name or 'database'}.lb``.
            Pass ``graph_uri`` alone to opt out of the SQL-side default.
        data_models (list): Optional list of DataModel or SymbolicDataModel
            classes to create tables for in the SQL store.
        entity_models (list): Optional list of entity (node) models for
            the graph store.
        relation_models (list): Optional list of relation (edge) models
            for the graph store.
        embedding_model (EmbeddingModel): Optional embedding model for
            vector similarity search; forwarded to both stores.
        metric (str): The distance metric for vector search.
            Options: "cosine", "l2sq", "ip" (default: "cosine").
        wipe_on_start (bool): Whether to clear the database on initialization
            (default: False).
        name (str): Optional name for the knowledge base (used for serialization
            and as the filename stem for the default ``.synalinks`` paths).
        encryption_key (str): Optional at-rest encryption key for the SQL
            store. Not forwarded to the graph store (LadybugDB has no
            encryption-at-rest support).
    """

    def __init__(
        self,
        *,
        uri=None,
        graph_uri=None,
        data_models=None,
        entity_models=None,
        relation_models=None,
        embedding_model=None,
        metric="cosine",
        wipe_on_start=False,
        name=None,
        encryption_key=None,
        **kwargs,
    ):
        # Two adapters can coexist on a single KnowledgeBase:
        #   * `sql_adapter` — row/table store, selected by `uri`
        #     (e.g. duckdb://...). Default backend is DuckDB.
        #   * `graph_adapter` — property-graph store, selected by
        #     `graph_uri` (e.g. ladybug://...). Default backend is
        #     LadybugDB.
        # The two stores are complementary, so a no-args
        # ``KnowledgeBase()`` instantiates BOTH against the same
        # ``synalinks_home()`` directory (``database.db`` for SQL,
        # ``database.lb`` for the graph). Passing only ``uri=`` keeps
        # the call SQL-only; passing only ``graph_uri=`` keeps it
        # graph-only — explicit URIs opt out of auto-pairing so a
        # caller targeting one engine isn't surprised by a second
        # file appearing on disk.
        self.sql_adapter = None
        self.graph_adapter = None

        auto_pair = uri is None and graph_uri is None
        want_sql = uri is not None or auto_pair
        want_graph = graph_uri is not None or auto_pair

        if want_sql:
            self.sql_adapter = database_adapters.get(uri)(
                uri=uri,
                data_models=data_models,
                embedding_model=embedding_model,
                metric=metric,
                wipe_on_start=wipe_on_start,
                name=name,
                encryption_key=encryption_key,
                **kwargs,
            )

        if want_graph:
            # `encryption_key` is intentionally NOT forwarded here:
            # LadybugDB has no encryption-at-rest support. A user that
            # passes it for a dual-adapter KB gets DuckDB encryption
            # for the SQL side and an unencrypted Ladybug graph store
            # (which is the same as if they'd omitted the kwarg).
            self.graph_adapter = graph_database_adapters.get(graph_uri)(
                uri=graph_uri,
                entity_models=entity_models,
                relation_models=relation_models,
                embedding_model=embedding_model,
                metric=metric,
                wipe_on_start=wipe_on_start,
                name=name,
                **kwargs,
            )

        self.uri = uri
        self.graph_uri = graph_uri
        self.data_models = data_models or []
        self.entity_models = entity_models or []
        self.relation_models = relation_models or []
        self.embedding_model = _get_em(embedding_model)
        self.metric = metric
        self.wipe_on_start = wipe_on_start
        if not name:
            self.name = auto_name("knowledge_base")
        else:
            self.name = name
        # `encryption_key` is deliberately NOT stored on `self` — it
        # lives only inside the adapter, and only as long as the
        # adapter does. This keeps the secret out of `get_config()`,
        # off-screen during repr/print, and unreferenced by any
        # serialization path. Callers must re-supply the key when
        # constructing a new KnowledgeBase against an encrypted file.

    async def update(
        self,
        data_model_or_data_models: Union[Any, List[Any], Dataset],
        *,
        verbose="auto",
    ) -> Union[Any, List[Any]]:
        """Insert or update records in the knowledge base.

        Args:
            data_model_or_data_models (JsonDataModel | List[JsonDataModel] | Dataset):
                A single ``JsonDataModel``, a list of ``JsonDataModel`` /
                ``DataModel`` instances, or a synalinks ``Dataset``.
                The ``Dataset`` form streams the source batch-by-batch
                (one ``adapter.update`` call per yielded batch) so memory
                stays bounded for large CSV / Parquet / HuggingFace
                sources. The dataset must be inputs-only — no
                ``output_template`` — because the knowledge base stores
                records, not ``(input, target)`` pairs; pass a
                labeled dataset and you'll get a ``ValueError``.

                Upserts key off the first declared field of the model —
                see the "Primary Key Convention" section on the class
                docstring for how that's resolved (and why no UUID is
                injected).
            verbose (int | str): ``"auto"``, ``0``, ``1``, or ``2``.
                Verbosity for the ``Dataset`` path; matches the
                trainer's ``fit()`` semantics. ``"auto"`` (default)
                resolves to ``1`` when a ``Dataset`` is passed (a
                per-batch progress bar — same widget ``fit()`` uses,
                with ETA when ``len(dataset)`` is known) and is a
                no-op for the scalar / list forms, which finish in a
                single adapter call.

        Returns:
            The primary key value(s) of the inserted/updated records.
            Scalar in / scalar out; list in / list out; ``Dataset`` in /
            flat list of every batch's ids concatenated.
        """
        if isinstance(data_model_or_data_models, Dataset):
            return await self._update_from_dataset(
                data_model_or_data_models, verbose=verbose
            )
        return await self.sql_adapter.update(data_model_or_data_models)

    async def _update_from_dataset(
        self, dataset: Dataset, *, verbose="auto"
    ) -> List[Any]:
        """Stream a ``Dataset`` into the adapter one batch at a time.

        Each batch yielded by the dataset is converted to a list of
        DataModel / JsonDataModel instances and handed to
        ``adapter.update``. The returned ids from every batch are
        accumulated into one flat list — same order as the dataset
        produced them.

        Inputs-only is enforced: a dataset configured with an
        ``output_template`` represents ``(input, target)`` training
        data, which isn't what the knowledge base stores. The check is
        the dataset's public ``output_template`` attribute, not the
        per-batch tuple length — so the rejection happens upfront,
        before any rows are consumed.
        """
        if dataset.output_template is not None:
            raise ValueError(
                "KnowledgeBase.update accepts only inputs-only datasets "
                "(no `output_template`). The knowledge base stores "
                "records, not (input, target) pairs."
            )

        # "auto" → 1 in the Dataset branch (we know there's iteration to
        # display). Outside this branch verbose is dead anyway.
        if verbose == "auto":
            verbose = 1

        progbar = None
        if verbose:
            try:
                target = len(dataset)
            except (TypeError, NotImplementedError):
                target = None
            progbar = Progbar(target=target, verbose=verbose, unit_name="batch")

        ids: List[Any] = []
        step = 0
        for batch in dataset:
            x = batch[0]
            if len(x) == 0:
                continue
            batch_ids = await self.sql_adapter.update(list(x))
            if isinstance(batch_ids, list):
                ids.extend(batch_ids)
            else:
                ids.append(batch_ids)
            step += 1
            if progbar is not None:
                progbar.update(step, values=[("rows", len(ids))])
        if progbar is not None:
            progbar.update(step, values=[("rows", len(ids))], finalize=True)
        return ids

    async def from_csv(
        self,
        path: str,
        *,
        table_name: Optional[str] = None,
        table_description: Optional[str] = None,
        delimiter: str = ",",
        encoding: str = "utf-8",
        header: bool = True,
    ) -> Any:
        """Bulk-load a CSV file directly into the knowledge base.

        Skips the Python row pipeline entirely (no Pydantic, no Jinja,
        no per-row INSERT) and instead delegates to the database's
        native CSV reader. Roughly two orders of magnitude faster than
        ``update(CSVDataset(...))`` for non-trivial files — see
        ``benchmarks/bench_kb_ingest.py``.

        The target table's schema is inferred directly from the
        file's columns, with the first column promoted to PRIMARY
        KEY. The returned `SymbolicDataModel` is the handle
        you pass to subsequent search / get calls — you don't need
        to pre-declare a ``DataModel`` for this table.

        Use the streaming ``update(<...>Dataset(...))`` path instead
        when source rows need transformation before storage (column
        renames, derived fields, HuggingFace datasets, etc.).

        Args:
            path: Path to the CSV file.
            table_name: Target table name. Defaults to the file's stem
                (``/data/my-docs.csv`` → ``MyDocs``). Whatever value
                lands here is always normalized to PascalCase.
            table_description: Optional natural-language description
                attached to the resulting schema.
            delimiter: Field delimiter. Defaults to ``","``.
            encoding: File encoding. Defaults to ``"utf-8"``.
            header: Whether the first row is a header. Defaults to
                ``True``.

        Returns:
            The `SymbolicDataModel` for the loaded table.
        """
        return await self.sql_adapter.from_csv(
            path,
            table_name=table_name,
            table_description=table_description,
            delimiter=delimiter,
            encoding=encoding,
            header=header,
        )

    async def from_parquet(
        self,
        path: str,
        *,
        table_name: Optional[str] = None,
        table_description: Optional[str] = None,
    ) -> Any:
        """Bulk-load a Parquet file directly into the knowledge base.

        Same trade-offs as `from_csv` — bypasses the Python row
        pipeline for native database ingestion. Parquet's schema is
        explicit in the file footer so there is no type-inference
        guesswork to worry about.

        Args:
            path: Path to the Parquet file.
            table_name: Target table name. Defaults to the file's stem
                coerced to PascalCase.
            table_description: Optional schema description.

        Returns:
            The `SymbolicDataModel` for the loaded table.
        """
        return await self.sql_adapter.from_parquet(
            path, table_name=table_name, table_description=table_description
        )

    async def from_json(
        self,
        path: str,
        *,
        table_name: Optional[str] = None,
        table_description: Optional[str] = None,
    ) -> Any:
        """Bulk-load a JSON file (top-level array of objects).

        Same trade-offs as `from_csv` / `from_parquet` —
        bypasses the Python row pipeline. The file must contain a
        top-level JSON array. Use `from_jsonl` for the
        one-object-per-line NDJSON format.

        Args:
            path: Path to the JSON file.
            table_name: Target table name. Defaults to the file's stem
                coerced to PascalCase.
            table_description: Optional schema description.

        Returns:
            The `SymbolicDataModel` for the loaded table.
        """
        return await self.sql_adapter.from_json(
            path, table_name=table_name, table_description=table_description
        )

    async def from_jsonl(
        self,
        path: str,
        *,
        table_name: Optional[str] = None,
        table_description: Optional[str] = None,
    ) -> Any:
        """Bulk-load a JSON Lines (NDJSON) file.

        Same trade-offs as `from_csv` / `from_parquet`,
        and the right call for very large JSON sources that aren't
        a single array.

        Args:
            path: Path to the JSONL file.
            table_name: Target table name. Defaults to the file's stem
                coerced to PascalCase.
            table_description: Optional schema description.

        Returns:
            The `SymbolicDataModel` for the loaded table.
        """
        return await self.sql_adapter.from_jsonl(
            path, table_name=table_name, table_description=table_description
        )

    async def rename(
        self,
        source: Any,
        *,
        table_name: Optional[str] = None,
        table_description: Optional[str] = None,
    ) -> Any:
        """Rename a table and/or update its description.

        Pass at least one of ``table_name`` / ``table_description``.
        When ``table_name`` is given the underlying table is
        renamed via ``ALTER TABLE …``, the FTS / vector indexes are
        rebuilt under the new name, and the adapter's known-models
        list is updated so subsequent default-table searches find
        the table under its new identity.

        Args:
            source: ``SymbolicDataModel`` or table-name string for
                the table to rename. The string form is itself
                PascalCase-normalized, so callers can pass the
                same input they used in `from_csv` (e.g.
                ``"my-docs"``).
            table_name: New table name. Always normalized to
                PascalCase.
            table_description: Optional natural-language description
                attached to the resulting schema.

        Returns:
            A fresh `SymbolicDataModel` for the (possibly
            renamed) table.
        """
        return await self.sql_adapter.rename(
            source,
            table_name=table_name,
            table_description=table_description,
        )

    async def get(
        self,
        id_or_ids: Union[Any, List[Any]],
        *,
        table_name: str,
    ) -> Union[Optional[Any], List[Optional[Any]]]:
        """Retrieve one or more records by primary key from a single table.

        Args:
            id_or_ids: A single primary key value, or a list of values.
            table_name: Target table.

        Returns:
            A single JsonDataModel (or ``None``) when called with one id;
            a list of JsonDataModels (with ``None`` in the slots that did
            not match) when called with a list.
        """
        return await self.sql_adapter.get(id_or_ids, table_name=table_name)

    async def getall(
        self,
        *,
        table_name: str,
        limit: int = 50,
        offset: int = 0,
    ) -> List[Any]:
        """Retrieve all records from a table with pagination.

        Args:
            table_name: Target table.
            limit: Maximum number of records to return (default: 50).
            offset: Number of records to skip (default: 0).

        Returns:
            List of JsonDataModels.
        """
        return await self.sql_adapter.getall(
            table_name=table_name, limit=limit, offset=offset
        )

    async def delete(
        self,
        id_or_ids: Union[Any, List[Any]],
        *,
        table_name: str,
    ) -> int:
        """Delete records by primary key from a single table.

        Pass a single id or a list. The FTS / vector indexes for the
        table are rebuilt afterwards so subsequent search calls
        don't return ghost rows.

        Args:
            id_or_ids: Primary key value, or a list of values.
            table_name: Target table.

        Returns:
            The number of rows actually deleted (0 if no id matched).
        """
        return await self.sql_adapter.delete(id_or_ids, table_name=table_name)

    async def drop_table(self, table_name: str) -> bool:
        """Drop a table from the knowledge base.

        Removes the table's rows, FTS index, and HNSW vector index,
        then drops the table itself. Also forgets the table in the
        adapter's known-models list.

        Args:
            table_name: Target table.

        Returns:
            ``True`` if a table was dropped, ``False`` if it didn't
            exist to begin with.
        """
        return await self.sql_adapter.drop_table(table_name)

    async def sql(
        self,
        sql: str,
        *,
        params: Optional[Dict[str, Any]] = None,
        output_format: str = "json",
        **kwargs,
    ) -> Union[List[Dict[str, Any]], str]:
        """Execute a raw SQL query against the knowledge base.

        Counterpart of `cypher` — the method is named after the
        query language so a dual-adapter KnowledgeBase has a clear
        per-language entry point.

        Args:
            sql (str): The SQL string to execute.
            params (dict): Optional list of parameters for parameterized queries.
            output_format: ``"json"`` (default, list of dicts —
                JSON-shaped Python data) or ``"csv"`` (CSV string,
                useful when handing the result to an LM).
            **kwargs (Any): Additional options. The most important one is
                ``read_only=True/False``. When ``True`` (the DuckDB adapter's
                default) two layers of defence apply:

                1. The SQL is parsed with the engine's own parser and any
                   non-``SELECT`` statement is rejected. This catches
                   multi-statement injection (e.g. ``SELECT 1; DROP TABLE x``),
                   ``COPY ... TO 'file'`` exfiltration, ``ATTACH``, ``EXPORT``,
                   and other side-effecting statements. This is the only
                   layer that blocks writes — the adapter's underlying
                   connection is read-write (one connection per adapter,
                   reused across operations), so the parser check is what
                   keeps untrusted SQL read-only.
                2. ``enable_external_access`` is disabled on that connection
                   at construction time, so ``SELECT`` table functions that
                   touch the host filesystem or network — ``read_csv``,
                   ``read_parquet``, ``read_json``, ``read_blob``,
                   ``read_text``, ``glob`` and the httpfs/S3 variants —
                   return a permission error instead of leaking files.
                   Without this layer,
                   ``SELECT * FROM read_csv('/etc/passwd', ...)`` would pass
                   defence (1) because it is a syntactically valid ``SELECT``.

                Pass ``read_only=False`` only from trusted call sites that
                genuinely need to mutate state. Those paths still run on
                the same sandboxed connection (no external I/O), but they
                bypass the parser check, so any SQL is accepted — keep them
                out of the LM-tool-call surface.

        Returns:
            (Union[List[Dict[str, Any]], str]): A list of dicts when
                ``output_format="json"``, or a CSV string when
                ``output_format="csv"``.
        """
        return await self.sql_adapter.sql(
            sql, params=params, output_format=output_format, **kwargs
        )

    async def similarity_search(
        self,
        text_or_texts: Optional[Union[str, List[str]]] = None,
        *,
        table_name: str,
        vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
        k: int = 10,
        threshold: Optional[float] = None,
        ef_search: Optional[int] = None,
        output_format: str = "json",
    ):
        """Vector similarity search against a single table.

        Args:
            text_or_texts: Query text or list of query texts. Ignored
                when ``vector_or_vectors`` is supplied.
            table_name: Target table (single-table search).
            vector_or_vectors: A pre-computed query vector, or a list of
                vectors, to search with directly instead of embedding
                ``text_or_texts``. When supplied, no embedding model is
                required on the knowledge base.
            k: Maximum number of results to return.
            threshold: Optional maximum vector-distance threshold.
            ef_search: HNSW search-time candidate-list depth.
                ``None`` keeps the index-time value (or the engine
                default). Higher = better recall, slower query.
            output_format: ``"json"`` (default, list of dicts —
                JSON-shaped Python data) or ``"csv"`` (CSV string,
                useful for handing results to an LM since CSV is
                ~30-50% fewer tokens than equivalent JSON).
        """
        return await self.sql_adapter.similarity_search(
            text_or_texts,
            table_name=table_name,
            vector_or_vectors=vector_or_vectors,
            k=k,
            threshold=threshold,
            ef_search=ef_search,
            output_format=output_format,
        )

    async def fulltext_search(
        self,
        text_or_texts: Union[str, List[str]],
        *,
        table_name: str,
        k: int = 10,
        threshold: Optional[float] = None,
        conjunctive: bool = False,
        bm25_b: Optional[float] = None,
        bm25_k: Optional[float] = None,
        output_format: str = "json",
    ):
        """BM25 full-text search against a single table.

        Args:
            text_or_texts: Query text or list of query texts.
            table_name: Target table.
            k: Maximum number of results.
            threshold: Optional minimum BM25 score.
            conjunctive: AND-mode query (every term must match).
                Default ``False`` keeps OR semantics.
            bm25_b: Optional override for BM25's ``b`` parameter
                (document-length normalization).
            bm25_k: Optional override for BM25's ``k1`` parameter
                (term-frequency saturation).
            output_format: ``"json"`` (list of dicts, default) / ``"csv"`` (text).
        """
        return await self.sql_adapter.fulltext_search(
            text_or_texts,
            table_name=table_name,
            k=k,
            threshold=threshold,
            conjunctive=conjunctive,
            bm25_b=bm25_b,
            bm25_k=bm25_k,
            output_format=output_format,
        )

    async def regex_search(
        self,
        pattern: str,
        *,
        table_name: str,
        fields: Optional[List[str]] = None,
        case_sensitive: bool = True,
        k: int = 10,
        output_format: str = "json",
    ):
        """Find rows whose string fields match a regular expression.

        DuckDB evaluates regexes with RE2, so patterns are linear-time
        and not vulnerable to catastrophic backtracking.

        Args:
            pattern: The regex pattern (RE2 syntax).
            table_name: Target table.
            fields: Field names to match against. Defaults to every
                string field on the schema. Names are snake_case-
                normalized to match stored column names.
            case_sensitive: When ``False``, match case-insensitively.
            k: Maximum number of results.
            output_format: ``"json"`` (list of dicts, default) / ``"csv"`` (text).
        """
        return await self.sql_adapter.regex_search(
            pattern,
            table_name=table_name,
            fields=fields,
            case_sensitive=case_sensitive,
            k=k,
            output_format=output_format,
        )

    async def hybrid_fts_search(
        self,
        text_or_texts: Optional[Union[str, List[str]]] = None,
        *,
        keywords: Optional[Union[str, List[str]]] = None,
        table_name: str,
        vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
        k: int = 10,
        k_rank: int = 60,
        similarity_threshold: Optional[float] = None,
        fulltext_threshold: Optional[float] = None,
        ef_search: Optional[int] = None,
        conjunctive: bool = False,
        bm25_b: Optional[float] = None,
        bm25_k: Optional[float] = None,
        output_format: str = "json",
    ):
        """Reciprocal-Rank-Fusion of vector similarity + BM25 fulltext.

        Falls back to full-text-only when there are no vectors to search
        with. The regex-side sibling is `hybrid_regex_search`.

        Args:
            text_or_texts: Query text or list of query texts. Ignored
                when ``vector_or_vectors`` is supplied.
            keywords: Query text(s) for the BM25 branch.
            table_name: Target table.
            vector_or_vectors: Pre-computed query vector(s) for the
                vector branch, used directly instead of embedding text.
            k: Maximum results.
            k_rank: RRF smoothing constant. Lower emphasizes top
                ranks more strongly (default: 60).
            similarity_threshold: Optional vector-distance threshold.
            fulltext_threshold: Optional BM25 threshold.
            ef_search: Forwarded to the vector branch; HNSW
                search-time candidate-list depth.
            conjunctive: Forwarded to the BM25 branch; AND-mode query.
            bm25_b: Forwarded to the BM25 branch; document-length
                normalization override.
            bm25_k: Forwarded to the BM25 branch; term-frequency
                saturation override.
            output_format: ``"json"`` (list of dicts, default) / ``"csv"`` (text).
        """
        return await self.sql_adapter.hybrid_fts_search(
            text_or_texts=text_or_texts,
            table_name=table_name,
            keywords=keywords,
            vector_or_vectors=vector_or_vectors,
            k=k,
            k_rank=k_rank,
            similarity_threshold=similarity_threshold,
            fulltext_threshold=fulltext_threshold,
            ef_search=ef_search,
            conjunctive=conjunctive,
            bm25_b=bm25_b,
            bm25_k=bm25_k,
            output_format=output_format,
        )

    async def hybrid_search(self, *args, **kwargs):
        """Deprecated alias of `hybrid_fts_search`.

        Kept for backwards compatibility. The new name is symmetric
        with `hybrid_regex_search`; prefer it in new code.
        """
        return await self.hybrid_fts_search(*args, **kwargs)

    async def hybrid_regex_search(
        self,
        text_or_texts: Optional[Union[str, List[str]]] = None,
        *,
        pattern_or_patterns: Union[str, List[str], None] = None,
        table_name: str,
        vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
        k: int = 10,
        k_rank: int = 60,
        similarity_threshold: Optional[float] = None,
        ef_search: Optional[int] = None,
        fields: Optional[List[str]] = None,
        case_sensitive: bool = True,
        output_format: str = "json",
    ):
        """Reciprocal-Rank-Fusion of vector similarity + regex.

        The regex-side counterpart to `hybrid_fts_search` (which
        pairs vector with BM25 fulltext). The two signals are
        orthogonal: vectors capture semantic similarity, regex
        captures exact textual shape. Ranks are fused with the same
        RRF formula.

        Args:
            text_or_texts: Natural-language query (or list) for the
                vector side. Ignored when ``vector_or_vectors`` is
                supplied.
            pattern_or_patterns: RE2 pattern (or list) for the regex
                side. ``None`` falls back to plain similarity search.
            table_name: Target table.
            vector_or_vectors: Pre-computed query vector(s) for the
                vector side, used directly instead of embedding text.
            k: Maximum results.
            k_rank: RRF smoothing constant.
            similarity_threshold: Vector-distance threshold.
            ef_search: Forwarded to the vector branch; HNSW
                search-time candidate-list depth.
            fields: Forwarded to the regex side.
            case_sensitive: Forwarded to the regex side.
            output_format: ``"json"`` (list of dicts, default) / ``"csv"`` (text).
        """
        return await self.sql_adapter.hybrid_regex_search(
            text_or_texts=text_or_texts,
            pattern_or_patterns=pattern_or_patterns,
            table_name=table_name,
            vector_or_vectors=vector_or_vectors,
            k=k,
            k_rank=k_rank,
            similarity_threshold=similarity_threshold,
            ef_search=ef_search,
            fields=fields,
            case_sensitive=case_sensitive,
            output_format=output_format,
        )

    # ---------------------------------------------------------------------
    # Graph store API — orthogonal to the SQL store above.
    #
    # These methods require the underlying adapter to be a
    # ``GraphDatabaseAdapter`` (selected by the URI scheme, e.g.
    # ``ladybug://``). Calling them on a SQL-only KnowledgeBase raises
    # ``NotImplementedError`` with a clear message instead of an opaque
    # ``AttributeError``.
    # ---------------------------------------------------------------------

    def _require_graph_adapter(self) -> None:
        """Raise if no graph adapter is attached to this KnowledgeBase.

        The graph adapter is set up only when ``graph_uri`` is passed
        at construction time; calling a graph method on a SQL-only KB
        must fail with a clear message instead of an ``AttributeError``
        from accessing ``None``.
        """
        if not isinstance(self.graph_adapter, GraphDatabaseAdapter):
            raise NotImplementedError(
                "Graph operations require a graph database adapter "
                "(pass graph_uri='ladybug://...' at construction time)."
            )

    async def update_entities(
        self,
        entity_or_entities: Union[Any, List[Any]],
    ) -> Union[Any, List[Any]]:
        """Insert or update one or more entities (nodes) in the graph.

        Graph-side counterpart of the SQL `update`. The name
        mirrors the `Entities` data model; pass either a single
        ``Entity`` or a list — the return shape matches the input.

        Args:
            entity_or_entities: An ``Entity`` instance, or a list of
                them (or anything satisfying ``is_entity``).

        Returns:
            The node id(s) assigned by the backend. Scalar in / scalar
            out; list in / list out.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.update_entities(entity_or_entities)

    async def update_relations(
        self,
        relation_or_relations: Union[Any, List[Any]],
    ) -> Union[Any, List[Any]]:
        """Insert or update one or more relations (edges) in the graph.

        Mirrors the `Relations` data model. Each relation's
        ``subj`` and ``obj`` are upserted as needed so every edge has
        both endpoints.

        Args:
            relation_or_relations: A ``Relation`` instance, or a list
                of them (or anything satisfying ``is_relation``).

        Returns:
            The edge id(s) assigned by the backend. Scalar in / scalar
            out; list in / list out.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.update_relations(relation_or_relations)

    async def update_knowledge_graph(self, knowledge_graph: Any) -> Any:
        """Bulk-insert a full knowledge graph (entities + relations).

        Equivalent to calling `update_entities` then
        `update_relations`, but concrete adapters may optimize
        the combined path.

        Args:
            knowledge_graph: A ``KnowledgeGraph`` instance.

        Returns:
            A dict with ``{"entities": [...ids...], "relations":
            [...ids...]}``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.update_knowledge_graph(knowledge_graph)

    async def get_entity(
        self,
        id_or_ids: Union[Any, List[Any]],
        *,
        label: str,
    ) -> Union[Optional[Any], List[Optional[Any]]]:
        """Retrieve one or more entities by primary key from a label.

        Args:
            id_or_ids: A single primary key value, or a list of values.
            label: The entity label (node type).

        Returns:
            A single ``JsonDataModel`` (or ``None``) for a scalar
            argument; a list (with ``None`` for misses) for a list
            argument.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.get_entity(id_or_ids, label=label)

    async def delete_entity(
        self,
        id_or_ids: Union[Any, List[Any]],
        *,
        label: str,
    ) -> int:
        """Delete entities by primary key from a label.

        Incident relations are removed by the adapter.

        Args:
            id_or_ids: Primary key value, or a list of values.
            label: The entity label.

        Returns:
            The number of entities actually deleted.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.delete_entity(id_or_ids, label=label)

    async def delete_relation(
        self,
        *,
        label: str,
        source_id: Any,
        target_id: Any,
    ) -> int:
        """Delete a relation between two entities.

        Args:
            label: The relation label.
            source_id: The subject (source) entity's primary key.
            target_id: The object (target) entity's primary key.

        Returns:
            The number of edges actually deleted.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.delete_relation(
            label=label, source_id=source_id, target_id=target_id
        )

    async def cypher(
        self,
        query: str,
        *,
        params: Optional[Dict[str, Any]] = None,
        output_format: str = "json",
        **kwargs: Any,
    ) -> Union[List[Dict[str, Any]], str]:
        """Execute a raw Cypher query against the graph.

        The graph-store counterpart to `query` (which executes
        SQL). Kept under a distinct name to avoid ambiguity when the
        KnowledgeBase grows both surfaces.

        Args:
            query: The Cypher query string.
            params: Optional parameters for parameterized queries.
            output_format: ``"json"`` (default) or ``"csv"``.
            **kwargs: Adapter-specific options (e.g. ``read_only``).

        Returns:
            A list of dicts when ``output_format="json"``, or a CSV
            string when ``output_format="csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.cypher(
            query, params=params, output_format=output_format, **kwargs
        )

    async def entity_similarity_search(
        self,
        text_or_texts: Optional[Union[str, List[str]]] = None,
        *,
        label: str,
        vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
        k: int = 10,
        threshold: Optional[float] = None,
        ef_search: Optional[int] = None,
        output_format: str = "json",
    ):
        """Vector similarity search over entities of a given label.

        Args:
            text_or_texts: Query text or list of query texts. Ignored
                when ``vector_or_vectors`` is supplied.
            label: The entity label to search within.
            vector_or_vectors: Pre-computed query vector or list of
                vectors to search with directly (no embedding model
                required).
            k: Maximum number of results.
            threshold: Optional vector-distance threshold.
            ef_search: Engine-specific search-time recall knob (HNSW
                ``efs``). Higher = better recall but slower.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.entity_similarity_search(
            text_or_texts,
            label=label,
            vector_or_vectors=vector_or_vectors,
            k=k,
            threshold=threshold,
            ef_search=ef_search,
            output_format=output_format,
        )

    async def entity_fulltext_search(
        self,
        text_or_texts: Union[str, List[str]],
        *,
        label: str,
        k: int = 10,
        threshold: Optional[float] = None,
        conjunctive: bool = False,
        bm25_b: Optional[float] = None,
        output_format: str = "json",
    ):
        """BM25 full-text search over entities of a given label.

        Args:
            text_or_texts: Query text or list of query texts.
            label: The entity label to search within.
            k: Maximum number of results.
            threshold: Optional minimum BM25 score.
            conjunctive: AND-mode query (every term must match).
            bm25_b: Optional override for BM25's ``b`` parameter.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.entity_fulltext_search(
            text_or_texts,
            label=label,
            k=k,
            threshold=threshold,
            conjunctive=conjunctive,
            bm25_b=bm25_b,
            output_format=output_format,
        )

    async def entity_regex_search(
        self,
        pattern: str,
        *,
        label: str,
        fields: Optional[List[str]] = None,
        case_sensitive: bool = True,
        k: int = 10,
        output_format: str = "json",
    ):
        """Regex search over entities of a label.

        Graph-side counterpart of `regex_search`. Applies the
        pattern to every indexed string field on the entity (or to
        the caller-supplied subset via ``fields``) and returns rows
        whose any matching field hits.

        Args:
            pattern: The regex pattern.
            label: The entity label to search within.
            fields: Optional whitelist of fields.
            case_sensitive: When ``False``, matches case-insensitively.
            k: Maximum number of rows.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.entity_regex_search(
            pattern,
            label=label,
            fields=fields,
            case_sensitive=case_sensitive,
            k=k,
            output_format=output_format,
        )

    async def entity_hybrid_regex_search(
        self,
        text_or_texts: Optional[Union[str, List[str]]] = None,
        *,
        pattern_or_patterns: Optional[Union[str, List[str]]] = None,
        label: str,
        vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
        fields: Optional[List[str]] = None,
        case_sensitive: bool = True,
        k: int = 10,
        k_rank: int = 60,
        similarity_threshold: Optional[float] = None,
        output_format: str = "json",
    ):
        """RRF fusion of vector similarity + regex match over entities.

        Sibling of `entity_hybrid_fts_search`. Falls through
        to `entity_similarity_search` when no patterns are
        supplied; falls through to `entity_regex_search` when
        there are no vectors to search with.

        Args:
            text_or_texts: Query text or list of query texts for the
                vector branch. Ignored when ``vector_or_vectors`` is
                supplied.
            pattern_or_patterns: Regex pattern (or list) for the
                regex branch. ``None`` skips the regex side.
            label: The entity label.
            vector_or_vectors: Pre-computed query vector(s) for the
                vector branch, used directly instead of embedding text.
            fields: Forwarded to `entity_regex_search`.
            case_sensitive: Forwarded to `entity_regex_search`.
            k: Maximum number of results.
            k_rank: RRF smoothing constant.
            similarity_threshold: Optional vector-distance threshold.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.entity_hybrid_regex_search(
            text_or_texts=text_or_texts,
            pattern_or_patterns=pattern_or_patterns,
            label=label,
            vector_or_vectors=vector_or_vectors,
            fields=fields,
            case_sensitive=case_sensitive,
            k=k,
            k_rank=k_rank,
            similarity_threshold=similarity_threshold,
            output_format=output_format,
        )

    async def entity_hybrid_fts_search(
        self,
        text_or_texts: Optional[Union[str, List[str]]] = None,
        *,
        keywords: Optional[Union[str, List[str]]] = None,
        label: str,
        vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
        k: int = 10,
        k_rank: int = 60,
        similarity_threshold: Optional[float] = None,
        fulltext_threshold: Optional[float] = None,
        ef_search: Optional[int] = None,
        conjunctive: bool = False,
        bm25_b: Optional[float] = None,
        output_format: str = "json",
    ):
        """RRF of vector similarity + BM25 fulltext over entities of a label.

        Graph-side counterpart of `hybrid_fts_search`.

        Args:
            text_or_texts: Query text or list of query texts. Ignored
                when ``vector_or_vectors`` is supplied.
            keywords: Query text(s) for the BM25 branch.
            label: The entity label to search within.
            vector_or_vectors: Pre-computed query vector(s) for the
                vector branch, used directly instead of embedding text.
            k: Maximum number of results.
            k_rank: RRF smoothing constant.
            similarity_threshold: Optional vector-distance threshold.
            fulltext_threshold: Optional BM25 threshold.
            ef_search: HNSW ``efs`` knob for the vector branch.
            conjunctive: AND vs OR for the BM25 branch.
            bm25_b: Optional override for BM25's ``b`` parameter.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.entity_hybrid_fts_search(
            text_or_texts=text_or_texts,
            label=label,
            keywords=keywords,
            vector_or_vectors=vector_or_vectors,
            k=k,
            k_rank=k_rank,
            similarity_threshold=similarity_threshold,
            fulltext_threshold=fulltext_threshold,
            ef_search=ef_search,
            conjunctive=conjunctive,
            bm25_b=bm25_b,
            output_format=output_format,
        )

    async def relation_similarity_search(
        self,
        text_or_texts: Optional[Union[str, List[str]]] = None,
        *,
        label: str,
        vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
        k: int = 10,
        threshold: Optional[float] = None,
        ef_search: Optional[int] = None,
        output_format: str = "json",
    ):
        """Vector similarity search over relations of a given label.

        The query matches against BOTH endpoints (subject and
        object); the adapter returns one row per matched edge with
        its best (lowest) distance and a ``matched_on`` tag
        (``"subj"``, ``"obj"``, or ``"both"``).

        Args:
            text_or_texts: Query text or list of query texts. Ignored
                when ``vector_or_vectors`` is supplied.
            label: The relation label to search within.
            vector_or_vectors: Pre-computed query vector or list of
                vectors to search with directly (matched against both
                endpoints).
            k: Maximum number of results.
            threshold: Optional vector-distance threshold per endpoint.
            ef_search: HNSW ``efs`` knob applied to both endpoint
                vector searches.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.relation_similarity_search(
            text_or_texts,
            label=label,
            vector_or_vectors=vector_or_vectors,
            k=k,
            threshold=threshold,
            ef_search=ef_search,
            output_format=output_format,
        )

    async def relation_fulltext_search(
        self,
        text_or_texts: Union[str, List[str]],
        *,
        label: str,
        k: int = 10,
        threshold: Optional[float] = None,
        conjunctive: bool = False,
        bm25_b: Optional[float] = None,
        output_format: str = "json",
    ):
        """BM25 fulltext search over relations of a given label.

        Per matched edge, the final ``score`` is the sum of the
        subject-side and object-side BM25 scores — either-endpoint
        union (edge surfaces if either endpoint matched).

        Args:
            text_or_texts: Query text or list of query texts.
            label: The relation label to search within.
            k: Maximum number of results.
            threshold: Optional minimum BM25 threshold applied per endpoint.
            conjunctive: AND-mode query (every term must match).
            bm25_b: Optional override for BM25's ``b`` parameter.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.relation_fulltext_search(
            text_or_texts,
            label=label,
            k=k,
            threshold=threshold,
            conjunctive=conjunctive,
            bm25_b=bm25_b,
            output_format=output_format,
        )

    async def relation_regex_search(
        self,
        pattern: str,
        *,
        label: str,
        fields: Optional[List[str]] = None,
        case_sensitive: bool = True,
        k: int = 10,
        output_format: str = "json",
    ):
        """Regex search over relations of a given label.

        Composed via `entity_regex_search` on each endpoint.
        Regex hits are binary; the row's ``score`` is 2.0 when both
        endpoints matched and 1.0 when only one did, with
        ``matched_on`` indicating the side(s).

        Args:
            pattern: The regex pattern.
            label: The relation label to search within.
            fields: Optional whitelist of fields, applied to both endpoints.
            case_sensitive: When ``False``, matches case-insensitively.
            k: Maximum number of rows.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.relation_regex_search(
            pattern,
            label=label,
            fields=fields,
            case_sensitive=case_sensitive,
            k=k,
            output_format=output_format,
        )

    async def relation_hybrid_regex_search(
        self,
        text_or_texts: Optional[Union[str, List[str]]] = None,
        *,
        pattern_or_patterns: Optional[Union[str, List[str]]] = None,
        label: str,
        vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
        fields: Optional[List[str]] = None,
        case_sensitive: bool = True,
        k: int = 10,
        k_rank: int = 60,
        similarity_threshold: Optional[float] = None,
        output_format: str = "json",
    ):
        """RRF of vector similarity + regex match over relations.

        Per matched edge, the final ``rrf_score`` is the sum of the
        subject's and the object's hybrid scores — same 4-source-RRF
        reduction as `relation_hybrid_fts_search`. Falls through
        to `relation_similarity_search` when no patterns are
        supplied.

        Args:
            text_or_texts: Query text or list of query texts for the
                vector branch. Ignored when ``vector_or_vectors`` is
                supplied.
            pattern_or_patterns: Regex pattern (or list) for the regex branch.
            label: The relation label.
            vector_or_vectors: Pre-computed query vector(s) for the
                vector branch, matched against both endpoints.
            fields: Forwarded to `entity_regex_search`.
            case_sensitive: Forwarded to `entity_regex_search`.
            k: Maximum number of results.
            k_rank: RRF smoothing constant.
            similarity_threshold: Optional vector-distance threshold.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.relation_hybrid_regex_search(
            text_or_texts=text_or_texts,
            pattern_or_patterns=pattern_or_patterns,
            label=label,
            vector_or_vectors=vector_or_vectors,
            fields=fields,
            case_sensitive=case_sensitive,
            k=k,
            k_rank=k_rank,
            similarity_threshold=similarity_threshold,
            output_format=output_format,
        )

    async def relation_hybrid_fts_search(
        self,
        text_or_texts: Optional[Union[str, List[str]]] = None,
        *,
        keywords: Optional[Union[str, List[str]]] = None,
        label: str,
        vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
        k: int = 10,
        k_rank: int = 60,
        similarity_threshold: Optional[float] = None,
        fulltext_threshold: Optional[float] = None,
        ef_search: Optional[int] = None,
        conjunctive: bool = False,
        bm25_b: Optional[float] = None,
        output_format: str = "json",
    ):
        """RRF of vector + BM25 fulltext over relations of a label.

        Either-endpoint union: per matched edge, the final
        ``rrf_score`` is the sum of the subject-side and
        object-side hybrid scores — equivalent to a 4-source RRF.
        Falls back to fulltext-only when there are no vectors to
        search with.

        Args:
            text_or_texts: Query text or list of query texts. Ignored
                when ``vector_or_vectors`` is supplied.
            keywords: Query text(s) for the BM25 branch.
            label: The relation label to search within.
            vector_or_vectors: Pre-computed query vector(s) for the
                vector branch, matched against both endpoints.
            k: Maximum number of results.
            k_rank: RRF smoothing constant.
            similarity_threshold: Optional vector-distance threshold.
            fulltext_threshold: Optional BM25 score threshold.
            ef_search: HNSW ``efs`` knob for the vector branch.
            conjunctive: AND vs OR for the BM25 branch.
            bm25_b: Optional override for BM25's ``b`` parameter.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.relation_hybrid_fts_search(
            text_or_texts=text_or_texts,
            label=label,
            keywords=keywords,
            vector_or_vectors=vector_or_vectors,
            k=k,
            k_rank=k_rank,
            similarity_threshold=similarity_threshold,
            fulltext_threshold=fulltext_threshold,
            ef_search=ef_search,
            conjunctive=conjunctive,
            bm25_b=bm25_b,
            output_format=output_format,
        )

    async def path_hybrid_fts_search(
        self,
        subj_text_or_texts: Optional[Union[str, List[str]]] = None,
        obj_text_or_texts: Optional[Union[str, List[str]]] = None,
        *,
        subj_keywords: Optional[Union[str, List[str]]] = None,
        obj_keywords: Optional[Union[str, List[str]]] = None,
        subj_label: str,
        obj_label: str,
        subj_vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
        obj_vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
        label: Optional[str] = None,
        min_hops: int = 1,
        max_hops: int = 3,
        k: int = 10,
        k_rank: int = 60,
        similarity_threshold: Optional[float] = None,
        fulltext_threshold: Optional[float] = None,
        ef_search: Optional[int] = None,
        conjunctive: bool = False,
        bm25_b: Optional[float] = None,
        output_format: str = "json",
    ):
        """Hybrid variable-length path search where BOTH endpoints match.

        AND-semantics. Each side is hybrid-searched (vec + fts)
        independently; per matching path the ``rrf_score`` is the
        sum of the subject-side and object-side hybrid scores.
        Falls back to fulltext-only when there are no vectors to
        search with on a side.

        Args:
            subj_text_or_texts: Query text (or list) for the subject.
                Ignored when ``subj_vector_or_vectors`` is supplied.
            obj_text_or_texts: Query text (or list) for the object.
                Ignored when ``obj_vector_or_vectors`` is supplied.
            subj_label: Entity label of the subject endpoint.
            obj_label: Entity label of the object endpoint.
            subj_vector_or_vectors: Pre-computed subject query vector(s).
            obj_vector_or_vectors: Pre-computed object query vector(s).
            label: Optional rel-label constraint for every hop.
            min_hops: Minimum hop count, inclusive (default: 1).
            max_hops: Maximum hop count, inclusive (default: 3).
            k: Maximum number of results.
            k_rank: RRF smoothing constant.
            similarity_threshold: Optional vector-distance threshold.
            fulltext_threshold: Optional BM25 score threshold.
            ef_search: HNSW ``efs`` knob applied to both endpoints.
            conjunctive: AND vs OR for the BM25 branch.
            bm25_b: Optional override for BM25's ``b`` parameter.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.path_hybrid_fts_search(
            subj_text_or_texts=subj_text_or_texts,
            obj_text_or_texts=obj_text_or_texts,
            subj_label=subj_label,
            obj_label=obj_label,
            subj_keywords=subj_keywords,
            obj_keywords=obj_keywords,
            subj_vector_or_vectors=subj_vector_or_vectors,
            obj_vector_or_vectors=obj_vector_or_vectors,
            label=label,
            min_hops=min_hops,
            max_hops=max_hops,
            k=k,
            k_rank=k_rank,
            similarity_threshold=similarity_threshold,
            fulltext_threshold=fulltext_threshold,
            ef_search=ef_search,
            conjunctive=conjunctive,
            bm25_b=bm25_b,
            output_format=output_format,
        )

    async def path_similarity_search(
        self,
        subj_text_or_texts: Optional[Union[str, List[str]]] = None,
        obj_text_or_texts: Optional[Union[str, List[str]]] = None,
        *,
        subj_label: str,
        obj_label: str,
        subj_vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
        obj_vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
        label: Optional[str] = None,
        min_hops: int = 1,
        max_hops: int = 3,
        k: int = 10,
        subj_threshold: Optional[float] = None,
        obj_threshold: Optional[float] = None,
        ef_search: Optional[int] = None,
        output_format: str = "json",
    ):
        """Variable-length path search where BOTH endpoints match.

        Returns paths of ``min_hops..max_hops`` edges whose start
        node is vector-close to the subject query AND whose end node
        is vector-close to the object query. ``label`` is an optional
        rel-label constraint applied to every hop; when omitted, any
        edge type is allowed.

        Each row carries the full path: ``nodes`` (every node along
        the way, endpoints included), ``rels`` (every edge), and
        ``length`` (hop count), alongside the two endpoint distances
        and flattened endpoint PKs.

        Args:
            subj_text_or_texts: Query text (or list) for the subject.
                Ignored when ``subj_vector_or_vectors`` is supplied.
            obj_text_or_texts: Query text (or list) for the object.
                Ignored when ``obj_vector_or_vectors`` is supplied.
            subj_label: Entity label of the subject endpoint.
            obj_label: Entity label of the object endpoint.
            subj_vector_or_vectors: Pre-computed subject query vector(s).
            obj_vector_or_vectors: Pre-computed object query vector(s).
            label: Optional rel-label constraint for every hop.
            min_hops: Minimum hop count, inclusive (default: 1).
            max_hops: Maximum hop count, inclusive (default: 3).
            k: Maximum number of results.
            subj_threshold: Optional subject-side distance threshold.
            obj_threshold: Optional object-side distance threshold.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.path_similarity_search(
            subj_text_or_texts,
            obj_text_or_texts,
            subj_label=subj_label,
            obj_label=obj_label,
            subj_vector_or_vectors=subj_vector_or_vectors,
            obj_vector_or_vectors=obj_vector_or_vectors,
            label=label,
            min_hops=min_hops,
            max_hops=max_hops,
            k=k,
            subj_threshold=subj_threshold,
            obj_threshold=obj_threshold,
            ef_search=ef_search,
            output_format=output_format,
        )

    async def path_fulltext_search(
        self,
        subj_text_or_texts: Union[str, List[str]],
        obj_text_or_texts: Union[str, List[str]],
        *,
        subj_label: str,
        obj_label: str,
        label: Optional[str] = None,
        min_hops: int = 1,
        max_hops: int = 3,
        k: int = 10,
        threshold: Optional[float] = None,
        conjunctive: bool = False,
        bm25_b: Optional[float] = None,
        output_format: str = "json",
    ):
        """BM25 variable-length path search, AND semantics.

        Same shape as `path_similarity_search` but driven by BM25
        fulltext on each endpoint. Per matched path, ``score`` is the
        sum of the subject-side and object-side BM25 scores.

        Args:
            subj_text_or_texts: Keyword query (or list) for the subject.
            obj_text_or_texts: Keyword query (or list) for the object.
            subj_label: Entity label of the subject endpoint.
            obj_label: Entity label of the object endpoint.
            label: Optional rel-label constraint for every hop.
            min_hops: Minimum hop count, inclusive (default: 1).
            max_hops: Maximum hop count, inclusive (default: 3).
            k: Maximum number of results.
            threshold: Optional minimum BM25 threshold per endpoint.
            conjunctive: AND-mode BM25 query.
            bm25_b: Optional override for BM25's ``b`` parameter.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.path_fulltext_search(
            subj_text_or_texts=subj_text_or_texts,
            obj_text_or_texts=obj_text_or_texts,
            subj_label=subj_label,
            obj_label=obj_label,
            label=label,
            min_hops=min_hops,
            max_hops=max_hops,
            k=k,
            threshold=threshold,
            conjunctive=conjunctive,
            bm25_b=bm25_b,
            output_format=output_format,
        )

    async def path_regex_search(
        self,
        subj_pattern: str,
        obj_pattern: str,
        *,
        subj_label: str,
        obj_label: str,
        label: Optional[str] = None,
        min_hops: int = 1,
        max_hops: int = 3,
        k: int = 10,
        fields: Optional[List[str]] = None,
        case_sensitive: bool = True,
        output_format: str = "json",
    ):
        """Regex variable-length path search, AND semantics.

        Both endpoints must match their respective regex pattern.
        Regex is binary; ranking is by path length (shorter first).

        Args:
            subj_pattern: Regex pattern for the subject endpoint.
            obj_pattern: Regex pattern for the object endpoint.
            subj_label: Entity label of the subject endpoint.
            obj_label: Entity label of the object endpoint.
            label: Optional rel-label constraint for every hop.
            min_hops: Minimum hop count, inclusive (default: 1).
            max_hops: Maximum hop count, inclusive (default: 3).
            k: Maximum number of results.
            fields: Optional whitelist of fields, applied to both endpoints.
            case_sensitive: When ``False``, matches case-insensitively.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.path_regex_search(
            subj_pattern=subj_pattern,
            obj_pattern=obj_pattern,
            subj_label=subj_label,
            obj_label=obj_label,
            label=label,
            min_hops=min_hops,
            max_hops=max_hops,
            k=k,
            fields=fields,
            case_sensitive=case_sensitive,
            output_format=output_format,
        )

    async def path_hybrid_regex_search(
        self,
        subj_text_or_texts: Optional[Union[str, List[str]]] = None,
        obj_text_or_texts: Optional[Union[str, List[str]]] = None,
        *,
        subj_pattern_or_patterns: Optional[Union[str, List[str]]] = None,
        obj_pattern_or_patterns: Optional[Union[str, List[str]]] = None,
        subj_label: str,
        obj_label: str,
        subj_vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
        obj_vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
        label: Optional[str] = None,
        min_hops: int = 1,
        max_hops: int = 3,
        k: int = 10,
        k_rank: int = 60,
        similarity_threshold: Optional[float] = None,
        fields: Optional[List[str]] = None,
        case_sensitive: bool = True,
        output_format: str = "json",
    ):
        """RRF of vector + regex variable-length path search, AND semantics.

        Each side is hybrid-searched (vec + regex) independently; the
        path's ``rrf_score`` is the sum of the two endpoint hybrid
        scores. Falls through to `path_similarity_search` when
        no patterns are supplied. Each side's vector branch can be
        driven by pre-computed vectors instead of text.

        Args:
            subj_text_or_texts: Query text (or list) for the subject
                vector branch. Ignored when ``subj_vector_or_vectors``
                is supplied.
            obj_text_or_texts: Query text (or list) for the object
                vector branch. Ignored when ``obj_vector_or_vectors``
                is supplied.
            subj_pattern_or_patterns: Regex pattern (or list) for the subject.
            obj_pattern_or_patterns: Regex pattern (or list) for the object.
            subj_label: Entity label of the subject endpoint.
            obj_label: Entity label of the object endpoint.
            subj_vector_or_vectors: Pre-computed subject query vector(s).
            obj_vector_or_vectors: Pre-computed object query vector(s).
            label: Optional rel-label constraint for every hop.
            min_hops: Minimum hop count, inclusive (default: 1).
            max_hops: Maximum hop count, inclusive (default: 3).
            k: Maximum number of results.
            k_rank: RRF smoothing constant.
            similarity_threshold: Optional vector-distance threshold.
            fields: Forwarded to the regex branch.
            case_sensitive: Forwarded to the regex branch.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.path_hybrid_regex_search(
            subj_text_or_texts=subj_text_or_texts,
            obj_text_or_texts=obj_text_or_texts,
            subj_pattern_or_patterns=subj_pattern_or_patterns,
            obj_pattern_or_patterns=obj_pattern_or_patterns,
            subj_label=subj_label,
            obj_label=obj_label,
            subj_vector_or_vectors=subj_vector_or_vectors,
            obj_vector_or_vectors=obj_vector_or_vectors,
            label=label,
            min_hops=min_hops,
            max_hops=max_hops,
            k=k,
            k_rank=k_rank,
            similarity_threshold=similarity_threshold,
            fields=fields,
            case_sensitive=case_sensitive,
            output_format=output_format,
        )

    def get_symbolic_data_models(self) -> List[Any]:
        """Retrieve all symbolic data models (table definitions) from the database.

        Returns a list of SymbolicDataModel objects representing each table
        in the database. This is useful for introspecting the database schema
        or for passing to search methods to limit the search scope.

        Returns:
            list: List of symbolic data models representing the database tables.

        Example:
            ```python
            symbolic_models = knowledge_base.get_symbolic_data_models()
            for model in symbolic_models:
                schema = model.get_schema()
                print(f"Table: {schema['title']}")
                print(f"Fields: {list(schema['properties'].keys())}")
            ```
        """
        return self.sql_adapter.get_symbolic_data_models()

    def get_symbolic_entities(self) -> List[Any]:
        """Retrieve a ``SymbolicDataModel`` per node label in the graph.

        Graph-side counterpart of `get_symbolic_data_models`,
        split by graph role: returns only entity (node) schemas.
        Each schema carries a ``label`` ``const`` discriminator and
        one property per stored column.

        Returns:
            list[SymbolicDataModel]: one per existing node label.
        """
        self._require_graph_adapter()
        return self.graph_adapter.get_symbolic_entities()

    def get_symbolic_relations(self) -> List[Any]:
        """Retrieve a ``SymbolicDataModel`` per relation label in the graph.

        Each returned schema includes its endpoint node schemas under
        ``$defs`` and references them as ``subj`` / ``obj`` via
        ``$ref`` — same shape Pydantic v2 emits for a hand-written
        `synalinks.Relation` subclass.

        Returns:
            list[SymbolicDataModel]: one per existing relation label.
        """
        self._require_graph_adapter()
        return self.graph_adapter.get_symbolic_relations()

    async def detect_communities(
        self,
        *,
        algorithm: str = "louvain",
        node_labels: Optional[List[str]] = None,
        rel_labels: Optional[List[str]] = None,
        max_iterations: Optional[int] = None,
    ) -> Any:
        """Run a community-detection algorithm on the graph store.

        Returns a `KnowledgeGraphs` — one
        `KnowledgeGraph` per detected community. Edges that
        straddle communities are dropped. See the adapter's
        documentation for algorithm-specific constraints (Louvain
        requires a single node label; WCC / SCC accept any number).

        Args:
            algorithm: ``"louvain"`` (default),
                ``"weakly_connected_components"``, or
                ``"strongly_connected_components"``.
            node_labels: Optional whitelist of NODE tables to
                project. ``None`` = every existing one.
            rel_labels: Optional whitelist of REL tables to project.
                ``None`` = every existing one.
            max_iterations: Optional upper bound on the algorithm's
                iteration count. ``None`` defers to the engine
                default.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.detect_communities(
            algorithm=algorithm,
            node_labels=node_labels,
            rel_labels=rel_labels,
            max_iterations=max_iterations,
        )

    async def pagerank(
        self,
        *,
        node_labels: Optional[List[str]] = None,
        rel_labels: Optional[List[str]] = None,
        damping_factor: float = 0.85,
        max_iterations: int = 100,
        tolerance: Optional[float] = None,
        normalize_initial: Optional[bool] = None,
        k: Optional[int] = None,
        output_format: str = "json",
    ):
        """Rank entities by PageRank importance on the graph store.

        Returns rows shaped like
        ``{<pk_column>: <pk_value>, "label": <label>, "node": <full node>,
        "rank": <float>}`` sorted by ``rank`` descending. The per-label
        PK column name is preserved verbatim, mirroring
        `entity_similarity_search`.

        Args:
            node_labels: Optional whitelist of NODE tables. ``None``
                projects every existing one.
            rel_labels: Optional whitelist of REL tables. ``None``
                projects every existing one.
            damping_factor: Probability of following an edge vs
                teleporting; 0.85 is the textbook value.
            max_iterations: Upper bound on iterations before
                convergence.
            tolerance: Optional convergence threshold; the algorithm
                stops early when the L1 change between iterations
                falls below this value. ``None`` defers to the
                engine default.
            normalize_initial: Whether to normalize the initial rank
                vector. ``None`` defers to the engine default.
            k: Optional cap on returned rows.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.pagerank(
            node_labels=node_labels,
            rel_labels=rel_labels,
            damping_factor=damping_factor,
            max_iterations=max_iterations,
            tolerance=tolerance,
            normalize_initial=normalize_initial,
            k=k,
            output_format=output_format,
        )

    async def local_graph_search(
        self,
        text_or_texts: Optional[Union[str, List[str]]] = None,
        *,
        label: str,
        vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
        max_hops: int = 2,
        k: int = 10,
        threshold: Optional[float] = None,
        rel_label: Optional[str] = None,
        ef_search: Optional[int] = None,
    ):
        """GraphRAG-style *local* search on the graph store.

        Vector-matches ``k`` seed entities of ``label``, expands their
        ``max_hops`` undirected neighbourhood, and returns the deduped
        union as a `KnowledgeGraph` — the local context subgraph
        for entity-centric questions ("what does the graph say around
        *these* entities"). See
        `GraphDatabaseAdapter.local_graph_search`.

        Args:
            text_or_texts: Query text (or list); neighbourhoods merge.
                Ignored when ``vector_or_vectors`` is supplied.
            label: Entity label whose vector index seeds the search.
            vector_or_vectors: Pre-computed seed vector(s), used directly
                instead of embedding ``text_or_texts``.
            max_hops: Neighbourhood radius in edges (>= 1, default 2).
            k: Number of seed entities per query text.
            threshold: Optional seed vector-distance ceiling.
            rel_label: Optional rel-label constraint per hop.
            ef_search: Optional HNSW search-depth for the seed lookup.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.local_graph_search(
            text_or_texts,
            label=label,
            vector_or_vectors=vector_or_vectors,
            max_hops=max_hops,
            k=k,
            threshold=threshold,
            rel_label=rel_label,
            ef_search=ef_search,
        )

    async def build_communities(
        self,
        *,
        algorithm: str = "louvain",
        node_labels: Optional[List[str]] = None,
        rel_labels: Optional[List[str]] = None,
        max_iterations: Optional[int] = None,
        with_pagerank: bool = True,
        damping_factor: float = 0.85,
    ) -> int:
        """Materialize community membership (and PageRank) onto nodes.

        The index-time half of GraphRAG-global: run once after loading
        the graph so `global_graph_search` can read precomputed
        ``community`` / ``rank`` properties instead of re-clustering on
        every query. Idempotent. See
        `GraphDatabaseAdapter.build_communities`.

        Args:
            algorithm: Community-detection algorithm; see
                `detect_communities`.
            node_labels: Optional NODE-table whitelist (``None`` = all).
            rel_labels: Optional REL-table whitelist (``None`` = all).
            max_iterations: Optional clustering iteration cap.
            with_pagerank: Also stamp a PageRank importance score.
            damping_factor: PageRank damping factor.

        Returns:
            (int): the number of nodes stamped.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.build_communities(
            algorithm=algorithm,
            node_labels=node_labels,
            rel_labels=rel_labels,
            max_iterations=max_iterations,
            with_pagerank=with_pagerank,
            damping_factor=damping_factor,
        )

    async def global_graph_search(
        self,
        *,
        node_labels: Optional[List[str]] = None,
        k: int = 10,
        members_per_community: int = 10,
        output_format: str = "json",
    ):
        """GraphRAG-style *global* search on the graph store.

        Rolls up the community / rank properties
        `build_communities` stamped into one aggregate row per
        community (size, total rank, representative members), ordered
        by importance — the theme-centric counterpart to
        `local_graph_search` ("what are the overall patterns
        across the *whole* graph"). Requires `build_communities`
        to have run first. See
        `GraphDatabaseAdapter.global_graph_search`.

        Args:
            node_labels: Optional NODE-table whitelist (``None`` = every
                stamped table).
            k: Maximum number of communities to return.
            members_per_community: Cap on members carried per community.
            output_format: ``"json"`` (default) or ``"csv"``.
        """
        self._require_graph_adapter()
        return await self.graph_adapter.global_graph_search(
            node_labels=node_labels,
            k=k,
            members_per_community=members_per_community,
            output_format=output_format,
        )

    def _serialize_models(self, models, key):
        """Serialize a list of DataModels to their symbolic form.

        Shared between ``data_models``, ``entity_models``, and
        ``relation_models`` since each list goes through the same
        symbolic-model conversion before serialization.
        """
        return [
            (
                serialization_lib.serialize_synalinks_object(
                    model.to_symbolic_data_model(
                        name=key + (f"_{i}_" if i > 0 else "_") + self.name
                    )
                )
                if not is_symbolic_data_model(model)
                else serialization_lib.serialize_synalinks_object(model)
            )
            for i, model in enumerate(models)
        ]

    def get_config(self):
        config = {
            "uri": self.uri,
            "graph_uri": self.graph_uri,
            "name": self.name,
            "metric": self.metric,
            "wipe_on_start": self.wipe_on_start,
        }
        data_models_config = {
            "data_models": self._serialize_models(self.data_models, "data_model"),
            "entity_models": self._serialize_models(self.entity_models, "entity_model"),
            "relation_models": self._serialize_models(
                self.relation_models, "relation_model"
            ),
        }
        embedding_model_config = {}
        if self.embedding_model:
            embedding_model_config = {
                "embedding_model": serialization_lib.serialize_synalinks_object(
                    self.embedding_model,
                )
            }
        return {
            **data_models_config,
            **embedding_model_config,
            **config,
        }

    @classmethod
    def from_config(cls, config):
        def _deserialize(items):
            return [
                serialization_lib.deserialize_synalinks_object(item) for item in items
            ]

        data_models = _deserialize(config.pop("data_models", []))
        entity_models = _deserialize(config.pop("entity_models", []))
        relation_models = _deserialize(config.pop("relation_models", []))
        embedding_model = None
        if "embedding_model" in config:
            embedding_model = serialization_lib.deserialize_synalinks_object(
                config.pop("embedding_model"),
            )
        return cls(
            data_models=data_models,
            entity_models=entity_models,
            relation_models=relation_models,
            embedding_model=embedding_model,
            **config,
        )

`build_communities(*, algorithm='louvain', node_labels=None, rel_labels=None, max_iterations=None, with_pagerank=True, damping_factor=0.85)` `async`

Materialize community membership (and PageRank) onto nodes.

The index-time half of GraphRAG-global: run once after loading the graph so global_graph_search can read precomputed community / rank properties instead of re-clustering on every query. Idempotent. See GraphDatabaseAdapter.build_communities.

Parameters:

Name	Type	Description	Default
`algorithm`	`str`	Community-detection algorithm; see `detect_communities`.	`'louvain'`
`node_labels`	`Optional[List[str]]`	Optional NODE-table whitelist (`None` = all).	`None`
`rel_labels`	`Optional[List[str]]`	Optional REL-table whitelist (`None` = all).	`None`
`max_iterations`	`Optional[int]`	Optional clustering iteration cap.	`None`
`with_pagerank`	`bool`	Also stamp a PageRank importance score.	`True`
`damping_factor`	`float`	PageRank damping factor.	`0.85`

Returns:

Type	Description
`int`	the number of nodes stamped.

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def build_communities(
    self,
    *,
    algorithm: str = "louvain",
    node_labels: Optional[List[str]] = None,
    rel_labels: Optional[List[str]] = None,
    max_iterations: Optional[int] = None,
    with_pagerank: bool = True,
    damping_factor: float = 0.85,
) -> int:
    """Materialize community membership (and PageRank) onto nodes.

    The index-time half of GraphRAG-global: run once after loading
    the graph so `global_graph_search` can read precomputed
    ``community`` / ``rank`` properties instead of re-clustering on
    every query. Idempotent. See
    `GraphDatabaseAdapter.build_communities`.

    Args:
        algorithm: Community-detection algorithm; see
            `detect_communities`.
        node_labels: Optional NODE-table whitelist (``None`` = all).
        rel_labels: Optional REL-table whitelist (``None`` = all).
        max_iterations: Optional clustering iteration cap.
        with_pagerank: Also stamp a PageRank importance score.
        damping_factor: PageRank damping factor.

    Returns:
        (int): the number of nodes stamped.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.build_communities(
        algorithm=algorithm,
        node_labels=node_labels,
        rel_labels=rel_labels,
        max_iterations=max_iterations,
        with_pagerank=with_pagerank,
        damping_factor=damping_factor,
    )

`cypher(query, *, params=None, output_format='json', **kwargs)` `async`

Execute a raw Cypher query against the graph.

The graph-store counterpart to query (which executes SQL). Kept under a distinct name to avoid ambiguity when the KnowledgeBase grows both surfaces.

Parameters:

Name	Type	Description	Default
`query`	`str`	The Cypher query string.	required
`params`	`Optional[Dict[str, Any]]`	Optional parameters for parameterized queries.	`None`
`output_format`	`str`	`"json"` (default) or `"csv"`.	`'json'`
`**kwargs`	`Any`	Adapter-specific options (e.g. `read_only`).	`{}`

Returns:

Type	Description
`Union[List[Dict[str, Any]], str]`	A list of dicts when `output_format="json"`, or a CSV
`Union[List[Dict[str, Any]], str]`	string when `output_format="csv"`.

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def cypher(
    self,
    query: str,
    *,
    params: Optional[Dict[str, Any]] = None,
    output_format: str = "json",
    **kwargs: Any,
) -> Union[List[Dict[str, Any]], str]:
    """Execute a raw Cypher query against the graph.

    The graph-store counterpart to `query` (which executes
    SQL). Kept under a distinct name to avoid ambiguity when the
    KnowledgeBase grows both surfaces.

    Args:
        query: The Cypher query string.
        params: Optional parameters for parameterized queries.
        output_format: ``"json"`` (default) or ``"csv"``.
        **kwargs: Adapter-specific options (e.g. ``read_only``).

    Returns:
        A list of dicts when ``output_format="json"``, or a CSV
        string when ``output_format="csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.cypher(
        query, params=params, output_format=output_format, **kwargs
    )

`delete(id_or_ids, *, table_name)` `async`

Delete records by primary key from a single table.

Pass a single id or a list. The FTS / vector indexes for the table are rebuilt afterwards so subsequent search calls don't return ghost rows.

Parameters:

Name	Type	Description	Default
`id_or_ids`	`Union[Any, List[Any]]`	Primary key value, or a list of values.	required
`table_name`	`str`	Target table.	required

Returns:

Type	Description
`int`	The number of rows actually deleted (0 if no id matched).

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def delete(
    self,
    id_or_ids: Union[Any, List[Any]],
    *,
    table_name: str,
) -> int:
    """Delete records by primary key from a single table.

    Pass a single id or a list. The FTS / vector indexes for the
    table are rebuilt afterwards so subsequent search calls
    don't return ghost rows.

    Args:
        id_or_ids: Primary key value, or a list of values.
        table_name: Target table.

    Returns:
        The number of rows actually deleted (0 if no id matched).
    """
    return await self.sql_adapter.delete(id_or_ids, table_name=table_name)

`delete_entity(id_or_ids, *, label)` `async`

Delete entities by primary key from a label.

Incident relations are removed by the adapter.

Parameters:

Name	Type	Description	Default
`id_or_ids`	`Union[Any, List[Any]]`	Primary key value, or a list of values.	required
`label`	`str`	The entity label.	required

Returns:

Type	Description
`int`	The number of entities actually deleted.

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def delete_entity(
    self,
    id_or_ids: Union[Any, List[Any]],
    *,
    label: str,
) -> int:
    """Delete entities by primary key from a label.

    Incident relations are removed by the adapter.

    Args:
        id_or_ids: Primary key value, or a list of values.
        label: The entity label.

    Returns:
        The number of entities actually deleted.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.delete_entity(id_or_ids, label=label)

`delete_relation(*, label, source_id, target_id)` `async`

Delete a relation between two entities.

Parameters:

Name	Type	Description	Default
`label`	`str`	The relation label.	required
`source_id`	`Any`	The subject (source) entity's primary key.	required
`target_id`	`Any`	The object (target) entity's primary key.	required

Returns:

Type	Description
`int`	The number of edges actually deleted.

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def delete_relation(
    self,
    *,
    label: str,
    source_id: Any,
    target_id: Any,
) -> int:
    """Delete a relation between two entities.

    Args:
        label: The relation label.
        source_id: The subject (source) entity's primary key.
        target_id: The object (target) entity's primary key.

    Returns:
        The number of edges actually deleted.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.delete_relation(
        label=label, source_id=source_id, target_id=target_id
    )

`detect_communities(*, algorithm='louvain', node_labels=None, rel_labels=None, max_iterations=None)` `async`

Run a community-detection algorithm on the graph store.

Returns a KnowledgeGraphs — one KnowledgeGraph per detected community. Edges that straddle communities are dropped. See the adapter's documentation for algorithm-specific constraints (Louvain requires a single node label; WCC / SCC accept any number).

Parameters:

Name	Type	Description	Default
`algorithm`	`str`	`"louvain"` (default), `"weakly_connected_components"`, or `"strongly_connected_components"`.	`'louvain'`
`node_labels`	`Optional[List[str]]`	Optional whitelist of NODE tables to project. `None` = every existing one.	`None`
`rel_labels`	`Optional[List[str]]`	Optional whitelist of REL tables to project. `None` = every existing one.	`None`
`max_iterations`	`Optional[int]`	Optional upper bound on the algorithm's iteration count. `None` defers to the engine default.	`None`

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def detect_communities(
    self,
    *,
    algorithm: str = "louvain",
    node_labels: Optional[List[str]] = None,
    rel_labels: Optional[List[str]] = None,
    max_iterations: Optional[int] = None,
) -> Any:
    """Run a community-detection algorithm on the graph store.

    Returns a `KnowledgeGraphs` — one
    `KnowledgeGraph` per detected community. Edges that
    straddle communities are dropped. See the adapter's
    documentation for algorithm-specific constraints (Louvain
    requires a single node label; WCC / SCC accept any number).

    Args:
        algorithm: ``"louvain"`` (default),
            ``"weakly_connected_components"``, or
            ``"strongly_connected_components"``.
        node_labels: Optional whitelist of NODE tables to
            project. ``None`` = every existing one.
        rel_labels: Optional whitelist of REL tables to project.
            ``None`` = every existing one.
        max_iterations: Optional upper bound on the algorithm's
            iteration count. ``None`` defers to the engine
            default.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.detect_communities(
        algorithm=algorithm,
        node_labels=node_labels,
        rel_labels=rel_labels,
        max_iterations=max_iterations,
    )

`drop_table(table_name)` `async`

Drop a table from the knowledge base.

Removes the table's rows, FTS index, and HNSW vector index, then drops the table itself. Also forgets the table in the adapter's known-models list.

Parameters:

Name	Type	Description	Default
`table_name`	`str`	Target table.	required

Returns:

Type	Description
`bool`	`True` if a table was dropped, `False` if it didn't
`bool`	exist to begin with.

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def drop_table(self, table_name: str) -> bool:
    """Drop a table from the knowledge base.

    Removes the table's rows, FTS index, and HNSW vector index,
    then drops the table itself. Also forgets the table in the
    adapter's known-models list.

    Args:
        table_name: Target table.

    Returns:
        ``True`` if a table was dropped, ``False`` if it didn't
        exist to begin with.
    """
    return await self.sql_adapter.drop_table(table_name)

`entity_fulltext_search(text_or_texts, *, label, k=10, threshold=None, conjunctive=False, bm25_b=None, output_format='json')` `async`

BM25 full-text search over entities of a given label.

Parameters:

Name	Type	Description	Default
`text_or_texts`	`Union[str, List[str]]`	Query text or list of query texts.	required
`label`	`str`	The entity label to search within.	required
`k`	`int`	Maximum number of results.	`10`
`threshold`	`Optional[float]`	Optional minimum BM25 score.	`None`
`conjunctive`	`bool`	AND-mode query (every term must match).	`False`
`bm25_b`	`Optional[float]`	Optional override for BM25's `b` parameter.	`None`
`output_format`	`str`	`"json"` (default) or `"csv"`.	`'json'`

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def entity_fulltext_search(
    self,
    text_or_texts: Union[str, List[str]],
    *,
    label: str,
    k: int = 10,
    threshold: Optional[float] = None,
    conjunctive: bool = False,
    bm25_b: Optional[float] = None,
    output_format: str = "json",
):
    """BM25 full-text search over entities of a given label.

    Args:
        text_or_texts: Query text or list of query texts.
        label: The entity label to search within.
        k: Maximum number of results.
        threshold: Optional minimum BM25 score.
        conjunctive: AND-mode query (every term must match).
        bm25_b: Optional override for BM25's ``b`` parameter.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.entity_fulltext_search(
        text_or_texts,
        label=label,
        k=k,
        threshold=threshold,
        conjunctive=conjunctive,
        bm25_b=bm25_b,
        output_format=output_format,
    )

`entity_hybrid_fts_search(text_or_texts=None, *, keywords=None, label, vector_or_vectors=None, k=10, k_rank=60, similarity_threshold=None, fulltext_threshold=None, ef_search=None, conjunctive=False, bm25_b=None, output_format='json')` `async`

RRF of vector similarity + BM25 fulltext over entities of a label.

Graph-side counterpart of hybrid_fts_search.

Parameters:

Name	Type	Description	Default
`text_or_texts`	`Optional[Union[str, List[str]]]`	Query text or list of query texts. Ignored when `vector_or_vectors` is supplied.	`None`
`keywords`	`Optional[Union[str, List[str]]]`	Query text(s) for the BM25 branch.	`None`
`label`	`str`	The entity label to search within.	required
`vector_or_vectors`	`Optional[Union[List[float], List[List[float]]]]`	Pre-computed query vector(s) for the vector branch, used directly instead of embedding text.	`None`
`k`	`int`	Maximum number of results.	`10`
`k_rank`	`int`	RRF smoothing constant.	`60`
`similarity_threshold`	`Optional[float]`	Optional vector-distance threshold.	`None`
`fulltext_threshold`	`Optional[float]`	Optional BM25 threshold.	`None`
`ef_search`	`Optional[int]`	HNSW `efs` knob for the vector branch.	`None`
`conjunctive`	`bool`	AND vs OR for the BM25 branch.	`False`
`bm25_b`	`Optional[float]`	Optional override for BM25's `b` parameter.	`None`
`output_format`	`str`	`"json"` (default) or `"csv"`.	`'json'`

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def entity_hybrid_fts_search(
    self,
    text_or_texts: Optional[Union[str, List[str]]] = None,
    *,
    keywords: Optional[Union[str, List[str]]] = None,
    label: str,
    vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
    k: int = 10,
    k_rank: int = 60,
    similarity_threshold: Optional[float] = None,
    fulltext_threshold: Optional[float] = None,
    ef_search: Optional[int] = None,
    conjunctive: bool = False,
    bm25_b: Optional[float] = None,
    output_format: str = "json",
):
    """RRF of vector similarity + BM25 fulltext over entities of a label.

    Graph-side counterpart of `hybrid_fts_search`.

    Args:
        text_or_texts: Query text or list of query texts. Ignored
            when ``vector_or_vectors`` is supplied.
        keywords: Query text(s) for the BM25 branch.
        label: The entity label to search within.
        vector_or_vectors: Pre-computed query vector(s) for the
            vector branch, used directly instead of embedding text.
        k: Maximum number of results.
        k_rank: RRF smoothing constant.
        similarity_threshold: Optional vector-distance threshold.
        fulltext_threshold: Optional BM25 threshold.
        ef_search: HNSW ``efs`` knob for the vector branch.
        conjunctive: AND vs OR for the BM25 branch.
        bm25_b: Optional override for BM25's ``b`` parameter.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.entity_hybrid_fts_search(
        text_or_texts=text_or_texts,
        label=label,
        keywords=keywords,
        vector_or_vectors=vector_or_vectors,
        k=k,
        k_rank=k_rank,
        similarity_threshold=similarity_threshold,
        fulltext_threshold=fulltext_threshold,
        ef_search=ef_search,
        conjunctive=conjunctive,
        bm25_b=bm25_b,
        output_format=output_format,
    )

`entity_hybrid_regex_search(text_or_texts=None, *, pattern_or_patterns=None, label, vector_or_vectors=None, fields=None, case_sensitive=True, k=10, k_rank=60, similarity_threshold=None, output_format='json')` `async`

RRF fusion of vector similarity + regex match over entities.

Sibling of entity_hybrid_fts_search. Falls through to entity_similarity_search when no patterns are supplied; falls through to entity_regex_search when there are no vectors to search with.

Parameters:

Name	Type	Description	Default
`text_or_texts`	`Optional[Union[str, List[str]]]`	Query text or list of query texts for the vector branch. Ignored when `vector_or_vectors` is supplied.	`None`
`pattern_or_patterns`	`Optional[Union[str, List[str]]]`	Regex pattern (or list) for the regex branch. `None` skips the regex side.	`None`
`label`	`str`	The entity label.	required
`vector_or_vectors`	`Optional[Union[List[float], List[List[float]]]]`	Pre-computed query vector(s) for the vector branch, used directly instead of embedding text.	`None`
`fields`	`Optional[List[str]]`	Forwarded to `entity_regex_search`.	`None`
`case_sensitive`	`bool`	Forwarded to `entity_regex_search`.	`True`
`k`	`int`	Maximum number of results.	`10`
`k_rank`	`int`	RRF smoothing constant.	`60`
`similarity_threshold`	`Optional[float]`	Optional vector-distance threshold.	`None`
`output_format`	`str`	`"json"` (default) or `"csv"`.	`'json'`

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def entity_hybrid_regex_search(
    self,
    text_or_texts: Optional[Union[str, List[str]]] = None,
    *,
    pattern_or_patterns: Optional[Union[str, List[str]]] = None,
    label: str,
    vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
    fields: Optional[List[str]] = None,
    case_sensitive: bool = True,
    k: int = 10,
    k_rank: int = 60,
    similarity_threshold: Optional[float] = None,
    output_format: str = "json",
):
    """RRF fusion of vector similarity + regex match over entities.

    Sibling of `entity_hybrid_fts_search`. Falls through
    to `entity_similarity_search` when no patterns are
    supplied; falls through to `entity_regex_search` when
    there are no vectors to search with.

    Args:
        text_or_texts: Query text or list of query texts for the
            vector branch. Ignored when ``vector_or_vectors`` is
            supplied.
        pattern_or_patterns: Regex pattern (or list) for the
            regex branch. ``None`` skips the regex side.
        label: The entity label.
        vector_or_vectors: Pre-computed query vector(s) for the
            vector branch, used directly instead of embedding text.
        fields: Forwarded to `entity_regex_search`.
        case_sensitive: Forwarded to `entity_regex_search`.
        k: Maximum number of results.
        k_rank: RRF smoothing constant.
        similarity_threshold: Optional vector-distance threshold.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.entity_hybrid_regex_search(
        text_or_texts=text_or_texts,
        pattern_or_patterns=pattern_or_patterns,
        label=label,
        vector_or_vectors=vector_or_vectors,
        fields=fields,
        case_sensitive=case_sensitive,
        k=k,
        k_rank=k_rank,
        similarity_threshold=similarity_threshold,
        output_format=output_format,
    )

`entity_regex_search(pattern, *, label, fields=None, case_sensitive=True, k=10, output_format='json')` `async`

Regex search over entities of a label.

Graph-side counterpart of regex_search. Applies the pattern to every indexed string field on the entity (or to the caller-supplied subset via fields) and returns rows whose any matching field hits.

Parameters:

Name	Type	Description	Default
`pattern`	`str`	The regex pattern.	required
`label`	`str`	The entity label to search within.	required
`fields`	`Optional[List[str]]`	Optional whitelist of fields.	`None`
`case_sensitive`	`bool`	When `False`, matches case-insensitively.	`True`
`k`	`int`	Maximum number of rows.	`10`
`output_format`	`str`	`"json"` (default) or `"csv"`.	`'json'`

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def entity_regex_search(
    self,
    pattern: str,
    *,
    label: str,
    fields: Optional[List[str]] = None,
    case_sensitive: bool = True,
    k: int = 10,
    output_format: str = "json",
):
    """Regex search over entities of a label.

    Graph-side counterpart of `regex_search`. Applies the
    pattern to every indexed string field on the entity (or to
    the caller-supplied subset via ``fields``) and returns rows
    whose any matching field hits.

    Args:
        pattern: The regex pattern.
        label: The entity label to search within.
        fields: Optional whitelist of fields.
        case_sensitive: When ``False``, matches case-insensitively.
        k: Maximum number of rows.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.entity_regex_search(
        pattern,
        label=label,
        fields=fields,
        case_sensitive=case_sensitive,
        k=k,
        output_format=output_format,
    )

`entity_similarity_search(text_or_texts=None, *, label, vector_or_vectors=None, k=10, threshold=None, ef_search=None, output_format='json')` `async`

Vector similarity search over entities of a given label.

Parameters:

Name	Type	Description	Default
`text_or_texts`	`Optional[Union[str, List[str]]]`	Query text or list of query texts. Ignored when `vector_or_vectors` is supplied.	`None`
`label`	`str`	The entity label to search within.	required
`vector_or_vectors`	`Optional[Union[List[float], List[List[float]]]]`	Pre-computed query vector or list of vectors to search with directly (no embedding model required).	`None`
`k`	`int`	Maximum number of results.	`10`
`threshold`	`Optional[float]`	Optional vector-distance threshold.	`None`
`ef_search`	`Optional[int]`	Engine-specific search-time recall knob (HNSW `efs`). Higher = better recall but slower.	`None`
`output_format`	`str`	`"json"` (default) or `"csv"`.	`'json'`

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def entity_similarity_search(
    self,
    text_or_texts: Optional[Union[str, List[str]]] = None,
    *,
    label: str,
    vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
    k: int = 10,
    threshold: Optional[float] = None,
    ef_search: Optional[int] = None,
    output_format: str = "json",
):
    """Vector similarity search over entities of a given label.

    Args:
        text_or_texts: Query text or list of query texts. Ignored
            when ``vector_or_vectors`` is supplied.
        label: The entity label to search within.
        vector_or_vectors: Pre-computed query vector or list of
            vectors to search with directly (no embedding model
            required).
        k: Maximum number of results.
        threshold: Optional vector-distance threshold.
        ef_search: Engine-specific search-time recall knob (HNSW
            ``efs``). Higher = better recall but slower.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.entity_similarity_search(
        text_or_texts,
        label=label,
        vector_or_vectors=vector_or_vectors,
        k=k,
        threshold=threshold,
        ef_search=ef_search,
        output_format=output_format,
    )

`from_csv(path, *, table_name=None, table_description=None, delimiter=',', encoding='utf-8', header=True)` `async`

Bulk-load a CSV file directly into the knowledge base.

Skips the Python row pipeline entirely (no Pydantic, no Jinja, no per-row INSERT) and instead delegates to the database's native CSV reader. Roughly two orders of magnitude faster than update(CSVDataset(...)) for non-trivial files — see benchmarks/bench_kb_ingest.py.

The target table's schema is inferred directly from the file's columns, with the first column promoted to PRIMARY KEY. The returned SymbolicDataModel is the handle you pass to subsequent search / get calls — you don't need to pre-declare a DataModel for this table.

Use the streaming update(<...>Dataset(...)) path instead when source rows need transformation before storage (column renames, derived fields, HuggingFace datasets, etc.).

Parameters:

Name	Type	Description	Default
`path`	`str`	Path to the CSV file.	required
`table_name`	`Optional[str]`	Target table name. Defaults to the file's stem (`/data/my-docs.csv` → `MyDocs`). Whatever value lands here is always normalized to PascalCase.	`None`
`table_description`	`Optional[str]`	Optional natural-language description attached to the resulting schema.	`None`
`delimiter`	`str`	Field delimiter. Defaults to `","`.	`','`
`encoding`	`str`	File encoding. Defaults to `"utf-8"`.	`'utf-8'`
`header`	`bool`	Whether the first row is a header. Defaults to `True`.	`True`

Returns:

Type	Description
`Any`	The `SymbolicDataModel` for the loaded table.

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def from_csv(
    self,
    path: str,
    *,
    table_name: Optional[str] = None,
    table_description: Optional[str] = None,
    delimiter: str = ",",
    encoding: str = "utf-8",
    header: bool = True,
) -> Any:
    """Bulk-load a CSV file directly into the knowledge base.

    Skips the Python row pipeline entirely (no Pydantic, no Jinja,
    no per-row INSERT) and instead delegates to the database's
    native CSV reader. Roughly two orders of magnitude faster than
    ``update(CSVDataset(...))`` for non-trivial files — see
    ``benchmarks/bench_kb_ingest.py``.

    The target table's schema is inferred directly from the
    file's columns, with the first column promoted to PRIMARY
    KEY. The returned `SymbolicDataModel` is the handle
    you pass to subsequent search / get calls — you don't need
    to pre-declare a ``DataModel`` for this table.

    Use the streaming ``update(<...>Dataset(...))`` path instead
    when source rows need transformation before storage (column
    renames, derived fields, HuggingFace datasets, etc.).

    Args:
        path: Path to the CSV file.
        table_name: Target table name. Defaults to the file's stem
            (``/data/my-docs.csv`` → ``MyDocs``). Whatever value
            lands here is always normalized to PascalCase.
        table_description: Optional natural-language description
            attached to the resulting schema.
        delimiter: Field delimiter. Defaults to ``","``.
        encoding: File encoding. Defaults to ``"utf-8"``.
        header: Whether the first row is a header. Defaults to
            ``True``.

    Returns:
        The `SymbolicDataModel` for the loaded table.
    """
    return await self.sql_adapter.from_csv(
        path,
        table_name=table_name,
        table_description=table_description,
        delimiter=delimiter,
        encoding=encoding,
        header=header,
    )

`from_json(path, *, table_name=None, table_description=None)` `async`

Bulk-load a JSON file (top-level array of objects).

Same trade-offs as from_csv / from_parquet — bypasses the Python row pipeline. The file must contain a top-level JSON array. Use from_jsonl for the one-object-per-line NDJSON format.

Parameters:

Name	Type	Description	Default
`path`	`str`	Path to the JSON file.	required
`table_name`	`Optional[str]`	Target table name. Defaults to the file's stem coerced to PascalCase.	`None`
`table_description`	`Optional[str]`	Optional schema description.	`None`

Returns:

Type	Description
`Any`	The `SymbolicDataModel` for the loaded table.

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def from_json(
    self,
    path: str,
    *,
    table_name: Optional[str] = None,
    table_description: Optional[str] = None,
) -> Any:
    """Bulk-load a JSON file (top-level array of objects).

    Same trade-offs as `from_csv` / `from_parquet` —
    bypasses the Python row pipeline. The file must contain a
    top-level JSON array. Use `from_jsonl` for the
    one-object-per-line NDJSON format.

    Args:
        path: Path to the JSON file.
        table_name: Target table name. Defaults to the file's stem
            coerced to PascalCase.
        table_description: Optional schema description.

    Returns:
        The `SymbolicDataModel` for the loaded table.
    """
    return await self.sql_adapter.from_json(
        path, table_name=table_name, table_description=table_description
    )

`from_jsonl(path, *, table_name=None, table_description=None)` `async`

Bulk-load a JSON Lines (NDJSON) file.

Same trade-offs as from_csv / from_parquet, and the right call for very large JSON sources that aren't a single array.

Parameters:

Name	Type	Description	Default
`path`	`str`	Path to the JSONL file.	required
`table_name`	`Optional[str]`	Target table name. Defaults to the file's stem coerced to PascalCase.	`None`
`table_description`	`Optional[str]`	Optional schema description.	`None`

Returns:

Type	Description
`Any`	The `SymbolicDataModel` for the loaded table.

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def from_jsonl(
    self,
    path: str,
    *,
    table_name: Optional[str] = None,
    table_description: Optional[str] = None,
) -> Any:
    """Bulk-load a JSON Lines (NDJSON) file.

    Same trade-offs as `from_csv` / `from_parquet`,
    and the right call for very large JSON sources that aren't
    a single array.

    Args:
        path: Path to the JSONL file.
        table_name: Target table name. Defaults to the file's stem
            coerced to PascalCase.
        table_description: Optional schema description.

    Returns:
        The `SymbolicDataModel` for the loaded table.
    """
    return await self.sql_adapter.from_jsonl(
        path, table_name=table_name, table_description=table_description
    )

`from_parquet(path, *, table_name=None, table_description=None)` `async`

Bulk-load a Parquet file directly into the knowledge base.

Same trade-offs as from_csv — bypasses the Python row pipeline for native database ingestion. Parquet's schema is explicit in the file footer so there is no type-inference guesswork to worry about.

Parameters:

Name	Type	Description	Default
`path`	`str`	Path to the Parquet file.	required
`table_name`	`Optional[str]`	Target table name. Defaults to the file's stem coerced to PascalCase.	`None`
`table_description`	`Optional[str]`	Optional schema description.	`None`

Returns:

Type	Description
`Any`	The `SymbolicDataModel` for the loaded table.

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def from_parquet(
    self,
    path: str,
    *,
    table_name: Optional[str] = None,
    table_description: Optional[str] = None,
) -> Any:
    """Bulk-load a Parquet file directly into the knowledge base.

    Same trade-offs as `from_csv` — bypasses the Python row
    pipeline for native database ingestion. Parquet's schema is
    explicit in the file footer so there is no type-inference
    guesswork to worry about.

    Args:
        path: Path to the Parquet file.
        table_name: Target table name. Defaults to the file's stem
            coerced to PascalCase.
        table_description: Optional schema description.

    Returns:
        The `SymbolicDataModel` for the loaded table.
    """
    return await self.sql_adapter.from_parquet(
        path, table_name=table_name, table_description=table_description
    )

`fulltext_search(text_or_texts, *, table_name, k=10, threshold=None, conjunctive=False, bm25_b=None, bm25_k=None, output_format='json')` `async`

BM25 full-text search against a single table.

Parameters:

Name	Type	Description	Default
`text_or_texts`	`Union[str, List[str]]`	Query text or list of query texts.	required
`table_name`	`str`	Target table.	required
`k`	`int`	Maximum number of results.	`10`
`threshold`	`Optional[float]`	Optional minimum BM25 score.	`None`
`conjunctive`	`bool`	AND-mode query (every term must match). Default `False` keeps OR semantics.	`False`
`bm25_b`	`Optional[float]`	Optional override for BM25's `b` parameter (document-length normalization).	`None`
`bm25_k`	`Optional[float]`	Optional override for BM25's `k1` parameter (term-frequency saturation).	`None`
`output_format`	`str`	`"json"` (list of dicts, default) / `"csv"` (text).	`'json'`

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def fulltext_search(
    self,
    text_or_texts: Union[str, List[str]],
    *,
    table_name: str,
    k: int = 10,
    threshold: Optional[float] = None,
    conjunctive: bool = False,
    bm25_b: Optional[float] = None,
    bm25_k: Optional[float] = None,
    output_format: str = "json",
):
    """BM25 full-text search against a single table.

    Args:
        text_or_texts: Query text or list of query texts.
        table_name: Target table.
        k: Maximum number of results.
        threshold: Optional minimum BM25 score.
        conjunctive: AND-mode query (every term must match).
            Default ``False`` keeps OR semantics.
        bm25_b: Optional override for BM25's ``b`` parameter
            (document-length normalization).
        bm25_k: Optional override for BM25's ``k1`` parameter
            (term-frequency saturation).
        output_format: ``"json"`` (list of dicts, default) / ``"csv"`` (text).
    """
    return await self.sql_adapter.fulltext_search(
        text_or_texts,
        table_name=table_name,
        k=k,
        threshold=threshold,
        conjunctive=conjunctive,
        bm25_b=bm25_b,
        bm25_k=bm25_k,
        output_format=output_format,
    )

`get(id_or_ids, *, table_name)` `async`

Retrieve one or more records by primary key from a single table.

Parameters:

Name	Type	Description	Default
`id_or_ids`	`Union[Any, List[Any]]`	A single primary key value, or a list of values.	required
`table_name`	`str`	Target table.	required

Returns:

Type	Description
`Union[Optional[Any], List[Optional[Any]]]`	A single JsonDataModel (or `None`) when called with one id;
`Union[Optional[Any], List[Optional[Any]]]`	a list of JsonDataModels (with `None` in the slots that did
`Union[Optional[Any], List[Optional[Any]]]`	not match) when called with a list.

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def get(
    self,
    id_or_ids: Union[Any, List[Any]],
    *,
    table_name: str,
) -> Union[Optional[Any], List[Optional[Any]]]:
    """Retrieve one or more records by primary key from a single table.

    Args:
        id_or_ids: A single primary key value, or a list of values.
        table_name: Target table.

    Returns:
        A single JsonDataModel (or ``None``) when called with one id;
        a list of JsonDataModels (with ``None`` in the slots that did
        not match) when called with a list.
    """
    return await self.sql_adapter.get(id_or_ids, table_name=table_name)

`get_entity(id_or_ids, *, label)` `async`

Retrieve one or more entities by primary key from a label.

Parameters:

Name	Type	Description	Default
`id_or_ids`	`Union[Any, List[Any]]`	A single primary key value, or a list of values.	required
`label`	`str`	The entity label (node type).	required

Returns:

Type	Description
`Union[Optional[Any], List[Optional[Any]]]`	A single `JsonDataModel` (or `None`) for a scalar
`Union[Optional[Any], List[Optional[Any]]]`	argument; a list (with `None` for misses) for a list
`Union[Optional[Any], List[Optional[Any]]]`	argument.

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def get_entity(
    self,
    id_or_ids: Union[Any, List[Any]],
    *,
    label: str,
) -> Union[Optional[Any], List[Optional[Any]]]:
    """Retrieve one or more entities by primary key from a label.

    Args:
        id_or_ids: A single primary key value, or a list of values.
        label: The entity label (node type).

    Returns:
        A single ``JsonDataModel`` (or ``None``) for a scalar
        argument; a list (with ``None`` for misses) for a list
        argument.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.get_entity(id_or_ids, label=label)

`get_symbolic_data_models()`

Retrieve all symbolic data models (table definitions) from the database.

Returns a list of SymbolicDataModel objects representing each table in the database. This is useful for introspecting the database schema or for passing to search methods to limit the search scope.

Returns:

Name	Type	Description
`list`	`List[Any]`	List of symbolic data models representing the database tables.

Example

symbolic_models = knowledge_base.get_symbolic_data_models()
for model in symbolic_models:
    schema = model.get_schema()
    print(f"Table: {schema['title']}")
    print(f"Fields: {list(schema['properties'].keys())}")

Source code in synalinks/src/knowledge_bases/knowledge_base.py

def get_symbolic_data_models(self) -> List[Any]:
    """Retrieve all symbolic data models (table definitions) from the database.

    Returns a list of SymbolicDataModel objects representing each table
    in the database. This is useful for introspecting the database schema
    or for passing to search methods to limit the search scope.

    Returns:
        list: List of symbolic data models representing the database tables.

    Example:
        ```python
        symbolic_models = knowledge_base.get_symbolic_data_models()
        for model in symbolic_models:
            schema = model.get_schema()
            print(f"Table: {schema['title']}")
            print(f"Fields: {list(schema['properties'].keys())}")
        ```
    """
    return self.sql_adapter.get_symbolic_data_models()

`get_symbolic_entities()`

Retrieve a SymbolicDataModel per node label in the graph.

Graph-side counterpart of get_symbolic_data_models, split by graph role: returns only entity (node) schemas. Each schema carries a label const discriminator and one property per stored column.

Returns:

Type	Description
`List[Any]`	list[SymbolicDataModel]: one per existing node label.

Source code in synalinks/src/knowledge_bases/knowledge_base.py

def get_symbolic_entities(self) -> List[Any]:
    """Retrieve a ``SymbolicDataModel`` per node label in the graph.

    Graph-side counterpart of `get_symbolic_data_models`,
    split by graph role: returns only entity (node) schemas.
    Each schema carries a ``label`` ``const`` discriminator and
    one property per stored column.

    Returns:
        list[SymbolicDataModel]: one per existing node label.
    """
    self._require_graph_adapter()
    return self.graph_adapter.get_symbolic_entities()

`get_symbolic_relations()`

Retrieve a SymbolicDataModel per relation label in the graph.

Each returned schema includes its endpoint node schemas under $defs and references them as subj / obj via $ref — same shape Pydantic v2 emits for a hand-written synalinks.Relation subclass.

Returns:

Type	Description
`List[Any]`	list[SymbolicDataModel]: one per existing relation label.

Source code in synalinks/src/knowledge_bases/knowledge_base.py

def get_symbolic_relations(self) -> List[Any]:
    """Retrieve a ``SymbolicDataModel`` per relation label in the graph.

    Each returned schema includes its endpoint node schemas under
    ``$defs`` and references them as ``subj`` / ``obj`` via
    ``$ref`` — same shape Pydantic v2 emits for a hand-written
    `synalinks.Relation` subclass.

    Returns:
        list[SymbolicDataModel]: one per existing relation label.
    """
    self._require_graph_adapter()
    return self.graph_adapter.get_symbolic_relations()

`getall(*, table_name, limit=50, offset=0)` `async`

Retrieve all records from a table with pagination.

Parameters:

Name	Type	Description	Default
`table_name`	`str`	Target table.	required
`limit`	`int`	Maximum number of records to return (default: 50).	`50`
`offset`	`int`	Number of records to skip (default: 0).	`0`

Returns:

Type	Description
`List[Any]`	List of JsonDataModels.

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def getall(
    self,
    *,
    table_name: str,
    limit: int = 50,
    offset: int = 0,
) -> List[Any]:
    """Retrieve all records from a table with pagination.

    Args:
        table_name: Target table.
        limit: Maximum number of records to return (default: 50).
        offset: Number of records to skip (default: 0).

    Returns:
        List of JsonDataModels.
    """
    return await self.sql_adapter.getall(
        table_name=table_name, limit=limit, offset=offset
    )

`global_graph_search(*, node_labels=None, k=10, members_per_community=10, output_format='json')` `async`

GraphRAG-style global search on the graph store.

Rolls up the community / rank properties build_communities stamped into one aggregate row per community (size, total rank, representative members), ordered by importance — the theme-centric counterpart to local_graph_search ("what are the overall patterns across the whole graph"). Requires build_communities to have run first. See GraphDatabaseAdapter.global_graph_search.

Parameters:

Name	Type	Description	Default
`node_labels`	`Optional[List[str]]`	Optional NODE-table whitelist (`None` = every stamped table).	`None`
`k`	`int`	Maximum number of communities to return.	`10`
`members_per_community`	`int`	Cap on members carried per community.	`10`
`output_format`	`str`	`"json"` (default) or `"csv"`.	`'json'`

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def global_graph_search(
    self,
    *,
    node_labels: Optional[List[str]] = None,
    k: int = 10,
    members_per_community: int = 10,
    output_format: str = "json",
):
    """GraphRAG-style *global* search on the graph store.

    Rolls up the community / rank properties
    `build_communities` stamped into one aggregate row per
    community (size, total rank, representative members), ordered
    by importance — the theme-centric counterpart to
    `local_graph_search` ("what are the overall patterns
    across the *whole* graph"). Requires `build_communities`
    to have run first. See
    `GraphDatabaseAdapter.global_graph_search`.

    Args:
        node_labels: Optional NODE-table whitelist (``None`` = every
            stamped table).
        k: Maximum number of communities to return.
        members_per_community: Cap on members carried per community.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.global_graph_search(
        node_labels=node_labels,
        k=k,
        members_per_community=members_per_community,
        output_format=output_format,
    )

`hybrid_fts_search(text_or_texts=None, *, keywords=None, table_name, vector_or_vectors=None, k=10, k_rank=60, similarity_threshold=None, fulltext_threshold=None, ef_search=None, conjunctive=False, bm25_b=None, bm25_k=None, output_format='json')` `async`

Reciprocal-Rank-Fusion of vector similarity + BM25 fulltext.

Falls back to full-text-only when there are no vectors to search with. The regex-side sibling is hybrid_regex_search.

Parameters:

Name	Type	Description	Default
`text_or_texts`	`Optional[Union[str, List[str]]]`	Query text or list of query texts. Ignored when `vector_or_vectors` is supplied.	`None`
`keywords`	`Optional[Union[str, List[str]]]`	Query text(s) for the BM25 branch.	`None`
`table_name`	`str`	Target table.	required
`vector_or_vectors`	`Optional[Union[List[float], List[List[float]]]]`	Pre-computed query vector(s) for the vector branch, used directly instead of embedding text.	`None`
`k`	`int`	Maximum results.	`10`
`k_rank`	`int`	RRF smoothing constant. Lower emphasizes top ranks more strongly (default: 60).	`60`
`similarity_threshold`	`Optional[float]`	Optional vector-distance threshold.	`None`
`fulltext_threshold`	`Optional[float]`	Optional BM25 threshold.	`None`
`ef_search`	`Optional[int]`	Forwarded to the vector branch; HNSW search-time candidate-list depth.	`None`
`conjunctive`	`bool`	Forwarded to the BM25 branch; AND-mode query.	`False`
`bm25_b`	`Optional[float]`	Forwarded to the BM25 branch; document-length normalization override.	`None`
`bm25_k`	`Optional[float]`	Forwarded to the BM25 branch; term-frequency saturation override.	`None`
`output_format`	`str`	`"json"` (list of dicts, default) / `"csv"` (text).	`'json'`

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def hybrid_fts_search(
    self,
    text_or_texts: Optional[Union[str, List[str]]] = None,
    *,
    keywords: Optional[Union[str, List[str]]] = None,
    table_name: str,
    vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
    k: int = 10,
    k_rank: int = 60,
    similarity_threshold: Optional[float] = None,
    fulltext_threshold: Optional[float] = None,
    ef_search: Optional[int] = None,
    conjunctive: bool = False,
    bm25_b: Optional[float] = None,
    bm25_k: Optional[float] = None,
    output_format: str = "json",
):
    """Reciprocal-Rank-Fusion of vector similarity + BM25 fulltext.

    Falls back to full-text-only when there are no vectors to search
    with. The regex-side sibling is `hybrid_regex_search`.

    Args:
        text_or_texts: Query text or list of query texts. Ignored
            when ``vector_or_vectors`` is supplied.
        keywords: Query text(s) for the BM25 branch.
        table_name: Target table.
        vector_or_vectors: Pre-computed query vector(s) for the
            vector branch, used directly instead of embedding text.
        k: Maximum results.
        k_rank: RRF smoothing constant. Lower emphasizes top
            ranks more strongly (default: 60).
        similarity_threshold: Optional vector-distance threshold.
        fulltext_threshold: Optional BM25 threshold.
        ef_search: Forwarded to the vector branch; HNSW
            search-time candidate-list depth.
        conjunctive: Forwarded to the BM25 branch; AND-mode query.
        bm25_b: Forwarded to the BM25 branch; document-length
            normalization override.
        bm25_k: Forwarded to the BM25 branch; term-frequency
            saturation override.
        output_format: ``"json"`` (list of dicts, default) / ``"csv"`` (text).
    """
    return await self.sql_adapter.hybrid_fts_search(
        text_or_texts=text_or_texts,
        table_name=table_name,
        keywords=keywords,
        vector_or_vectors=vector_or_vectors,
        k=k,
        k_rank=k_rank,
        similarity_threshold=similarity_threshold,
        fulltext_threshold=fulltext_threshold,
        ef_search=ef_search,
        conjunctive=conjunctive,
        bm25_b=bm25_b,
        bm25_k=bm25_k,
        output_format=output_format,
    )

`hybrid_regex_search(text_or_texts=None, *, pattern_or_patterns=None, table_name, vector_or_vectors=None, k=10, k_rank=60, similarity_threshold=None, ef_search=None, fields=None, case_sensitive=True, output_format='json')` `async`

Reciprocal-Rank-Fusion of vector similarity + regex.

The regex-side counterpart to hybrid_fts_search (which pairs vector with BM25 fulltext). The two signals are orthogonal: vectors capture semantic similarity, regex captures exact textual shape. Ranks are fused with the same RRF formula.

Parameters:

Name	Type	Description	Default
`text_or_texts`	`Optional[Union[str, List[str]]]`	Natural-language query (or list) for the vector side. Ignored when `vector_or_vectors` is supplied.	`None`
`pattern_or_patterns`	`Union[str, List[str], None]`	RE2 pattern (or list) for the regex side. `None` falls back to plain similarity search.	`None`
`table_name`	`str`	Target table.	required
`vector_or_vectors`	`Optional[Union[List[float], List[List[float]]]]`	Pre-computed query vector(s) for the vector side, used directly instead of embedding text.	`None`
`k`	`int`	Maximum results.	`10`
`k_rank`	`int`	RRF smoothing constant.	`60`
`similarity_threshold`	`Optional[float]`	Vector-distance threshold.	`None`
`ef_search`	`Optional[int]`	Forwarded to the vector branch; HNSW search-time candidate-list depth.	`None`
`fields`	`Optional[List[str]]`	Forwarded to the regex side.	`None`
`case_sensitive`	`bool`	Forwarded to the regex side.	`True`
`output_format`	`str`	`"json"` (list of dicts, default) / `"csv"` (text).	`'json'`

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def hybrid_regex_search(
    self,
    text_or_texts: Optional[Union[str, List[str]]] = None,
    *,
    pattern_or_patterns: Union[str, List[str], None] = None,
    table_name: str,
    vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
    k: int = 10,
    k_rank: int = 60,
    similarity_threshold: Optional[float] = None,
    ef_search: Optional[int] = None,
    fields: Optional[List[str]] = None,
    case_sensitive: bool = True,
    output_format: str = "json",
):
    """Reciprocal-Rank-Fusion of vector similarity + regex.

    The regex-side counterpart to `hybrid_fts_search` (which
    pairs vector with BM25 fulltext). The two signals are
    orthogonal: vectors capture semantic similarity, regex
    captures exact textual shape. Ranks are fused with the same
    RRF formula.

    Args:
        text_or_texts: Natural-language query (or list) for the
            vector side. Ignored when ``vector_or_vectors`` is
            supplied.
        pattern_or_patterns: RE2 pattern (or list) for the regex
            side. ``None`` falls back to plain similarity search.
        table_name: Target table.
        vector_or_vectors: Pre-computed query vector(s) for the
            vector side, used directly instead of embedding text.
        k: Maximum results.
        k_rank: RRF smoothing constant.
        similarity_threshold: Vector-distance threshold.
        ef_search: Forwarded to the vector branch; HNSW
            search-time candidate-list depth.
        fields: Forwarded to the regex side.
        case_sensitive: Forwarded to the regex side.
        output_format: ``"json"`` (list of dicts, default) / ``"csv"`` (text).
    """
    return await self.sql_adapter.hybrid_regex_search(
        text_or_texts=text_or_texts,
        pattern_or_patterns=pattern_or_patterns,
        table_name=table_name,
        vector_or_vectors=vector_or_vectors,
        k=k,
        k_rank=k_rank,
        similarity_threshold=similarity_threshold,
        ef_search=ef_search,
        fields=fields,
        case_sensitive=case_sensitive,
        output_format=output_format,
    )

`hybrid_search(*args, **kwargs)` `async`

Deprecated alias of hybrid_fts_search.

Kept for backwards compatibility. The new name is symmetric with hybrid_regex_search; prefer it in new code.

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def hybrid_search(self, *args, **kwargs):
    """Deprecated alias of `hybrid_fts_search`.

    Kept for backwards compatibility. The new name is symmetric
    with `hybrid_regex_search`; prefer it in new code.
    """
    return await self.hybrid_fts_search(*args, **kwargs)

`local_graph_search(text_or_texts=None, *, label, vector_or_vectors=None, max_hops=2, k=10, threshold=None, rel_label=None, ef_search=None)` `async`

GraphRAG-style local search on the graph store.

Vector-matches k seed entities of label, expands their max_hops undirected neighbourhood, and returns the deduped union as a KnowledgeGraph — the local context subgraph for entity-centric questions ("what does the graph say around these entities"). See GraphDatabaseAdapter.local_graph_search.

Parameters:

Name	Type	Description	Default
`text_or_texts`	`Optional[Union[str, List[str]]]`	Query text (or list); neighbourhoods merge. Ignored when `vector_or_vectors` is supplied.	`None`
`label`	`str`	Entity label whose vector index seeds the search.	required
`vector_or_vectors`	`Optional[Union[List[float], List[List[float]]]]`	Pre-computed seed vector(s), used directly instead of embedding `text_or_texts`.	`None`
`max_hops`	`int`	Neighbourhood radius in edges (>= 1, default 2).	`2`
`k`	`int`	Number of seed entities per query text.	`10`
`threshold`	`Optional[float]`	Optional seed vector-distance ceiling.	`None`
`rel_label`	`Optional[str]`	Optional rel-label constraint per hop.	`None`
`ef_search`	`Optional[int]`	Optional HNSW search-depth for the seed lookup.	`None`

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def local_graph_search(
    self,
    text_or_texts: Optional[Union[str, List[str]]] = None,
    *,
    label: str,
    vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
    max_hops: int = 2,
    k: int = 10,
    threshold: Optional[float] = None,
    rel_label: Optional[str] = None,
    ef_search: Optional[int] = None,
):
    """GraphRAG-style *local* search on the graph store.

    Vector-matches ``k`` seed entities of ``label``, expands their
    ``max_hops`` undirected neighbourhood, and returns the deduped
    union as a `KnowledgeGraph` — the local context subgraph
    for entity-centric questions ("what does the graph say around
    *these* entities"). See
    `GraphDatabaseAdapter.local_graph_search`.

    Args:
        text_or_texts: Query text (or list); neighbourhoods merge.
            Ignored when ``vector_or_vectors`` is supplied.
        label: Entity label whose vector index seeds the search.
        vector_or_vectors: Pre-computed seed vector(s), used directly
            instead of embedding ``text_or_texts``.
        max_hops: Neighbourhood radius in edges (>= 1, default 2).
        k: Number of seed entities per query text.
        threshold: Optional seed vector-distance ceiling.
        rel_label: Optional rel-label constraint per hop.
        ef_search: Optional HNSW search-depth for the seed lookup.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.local_graph_search(
        text_or_texts,
        label=label,
        vector_or_vectors=vector_or_vectors,
        max_hops=max_hops,
        k=k,
        threshold=threshold,
        rel_label=rel_label,
        ef_search=ef_search,
    )

`pagerank(*, node_labels=None, rel_labels=None, damping_factor=0.85, max_iterations=100, tolerance=None, normalize_initial=None, k=None, output_format='json')` `async`

Rank entities by PageRank importance on the graph store.

Returns rows shaped like {<pk_column>: <pk_value>, "label": <label>, "node": <full node>, "rank": <float>} sorted by rank descending. The per-label PK column name is preserved verbatim, mirroring entity_similarity_search.

Parameters:

Name	Type	Description	Default
`node_labels`	`Optional[List[str]]`	Optional whitelist of NODE tables. `None` projects every existing one.	`None`
`rel_labels`	`Optional[List[str]]`	Optional whitelist of REL tables. `None` projects every existing one.	`None`
`damping_factor`	`float`	Probability of following an edge vs teleporting; 0.85 is the textbook value.	`0.85`
`max_iterations`	`int`	Upper bound on iterations before convergence.	`100`
`tolerance`	`Optional[float]`	Optional convergence threshold; the algorithm stops early when the L1 change between iterations falls below this value. `None` defers to the engine default.	`None`
`normalize_initial`	`Optional[bool]`	Whether to normalize the initial rank vector. `None` defers to the engine default.	`None`
`k`	`Optional[int]`	Optional cap on returned rows.	`None`
`output_format`	`str`	`"json"` (default) or `"csv"`.	`'json'`

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def pagerank(
    self,
    *,
    node_labels: Optional[List[str]] = None,
    rel_labels: Optional[List[str]] = None,
    damping_factor: float = 0.85,
    max_iterations: int = 100,
    tolerance: Optional[float] = None,
    normalize_initial: Optional[bool] = None,
    k: Optional[int] = None,
    output_format: str = "json",
):
    """Rank entities by PageRank importance on the graph store.

    Returns rows shaped like
    ``{<pk_column>: <pk_value>, "label": <label>, "node": <full node>,
    "rank": <float>}`` sorted by ``rank`` descending. The per-label
    PK column name is preserved verbatim, mirroring
    `entity_similarity_search`.

    Args:
        node_labels: Optional whitelist of NODE tables. ``None``
            projects every existing one.
        rel_labels: Optional whitelist of REL tables. ``None``
            projects every existing one.
        damping_factor: Probability of following an edge vs
            teleporting; 0.85 is the textbook value.
        max_iterations: Upper bound on iterations before
            convergence.
        tolerance: Optional convergence threshold; the algorithm
            stops early when the L1 change between iterations
            falls below this value. ``None`` defers to the
            engine default.
        normalize_initial: Whether to normalize the initial rank
            vector. ``None`` defers to the engine default.
        k: Optional cap on returned rows.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.pagerank(
        node_labels=node_labels,
        rel_labels=rel_labels,
        damping_factor=damping_factor,
        max_iterations=max_iterations,
        tolerance=tolerance,
        normalize_initial=normalize_initial,
        k=k,
        output_format=output_format,
    )

`path_fulltext_search(subj_text_or_texts, obj_text_or_texts, *, subj_label, obj_label, label=None, min_hops=1, max_hops=3, k=10, threshold=None, conjunctive=False, bm25_b=None, output_format='json')` `async`

BM25 variable-length path search, AND semantics.

Same shape as path_similarity_search but driven by BM25 fulltext on each endpoint. Per matched path, score is the sum of the subject-side and object-side BM25 scores.

Parameters:

Name	Type	Description	Default
`subj_text_or_texts`	`Union[str, List[str]]`	Keyword query (or list) for the subject.	required
`obj_text_or_texts`	`Union[str, List[str]]`	Keyword query (or list) for the object.	required
`subj_label`	`str`	Entity label of the subject endpoint.	required
`obj_label`	`str`	Entity label of the object endpoint.	required
`label`	`Optional[str]`	Optional rel-label constraint for every hop.	`None`
`min_hops`	`int`	Minimum hop count, inclusive (default: 1).	`1`
`max_hops`	`int`	Maximum hop count, inclusive (default: 3).	`3`
`k`	`int`	Maximum number of results.	`10`
`threshold`	`Optional[float]`	Optional minimum BM25 threshold per endpoint.	`None`
`conjunctive`	`bool`	AND-mode BM25 query.	`False`
`bm25_b`	`Optional[float]`	Optional override for BM25's `b` parameter.	`None`
`output_format`	`str`	`"json"` (default) or `"csv"`.	`'json'`

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def path_fulltext_search(
    self,
    subj_text_or_texts: Union[str, List[str]],
    obj_text_or_texts: Union[str, List[str]],
    *,
    subj_label: str,
    obj_label: str,
    label: Optional[str] = None,
    min_hops: int = 1,
    max_hops: int = 3,
    k: int = 10,
    threshold: Optional[float] = None,
    conjunctive: bool = False,
    bm25_b: Optional[float] = None,
    output_format: str = "json",
):
    """BM25 variable-length path search, AND semantics.

    Same shape as `path_similarity_search` but driven by BM25
    fulltext on each endpoint. Per matched path, ``score`` is the
    sum of the subject-side and object-side BM25 scores.

    Args:
        subj_text_or_texts: Keyword query (or list) for the subject.
        obj_text_or_texts: Keyword query (or list) for the object.
        subj_label: Entity label of the subject endpoint.
        obj_label: Entity label of the object endpoint.
        label: Optional rel-label constraint for every hop.
        min_hops: Minimum hop count, inclusive (default: 1).
        max_hops: Maximum hop count, inclusive (default: 3).
        k: Maximum number of results.
        threshold: Optional minimum BM25 threshold per endpoint.
        conjunctive: AND-mode BM25 query.
        bm25_b: Optional override for BM25's ``b`` parameter.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.path_fulltext_search(
        subj_text_or_texts=subj_text_or_texts,
        obj_text_or_texts=obj_text_or_texts,
        subj_label=subj_label,
        obj_label=obj_label,
        label=label,
        min_hops=min_hops,
        max_hops=max_hops,
        k=k,
        threshold=threshold,
        conjunctive=conjunctive,
        bm25_b=bm25_b,
        output_format=output_format,
    )

`path_hybrid_fts_search(subj_text_or_texts=None, obj_text_or_texts=None, *, subj_keywords=None, obj_keywords=None, subj_label, obj_label, subj_vector_or_vectors=None, obj_vector_or_vectors=None, label=None, min_hops=1, max_hops=3, k=10, k_rank=60, similarity_threshold=None, fulltext_threshold=None, ef_search=None, conjunctive=False, bm25_b=None, output_format='json')` `async`

Hybrid variable-length path search where BOTH endpoints match.

AND-semantics. Each side is hybrid-searched (vec + fts) independently; per matching path the rrf_score is the sum of the subject-side and object-side hybrid scores. Falls back to fulltext-only when there are no vectors to search with on a side.

Parameters:

Name	Type	Description	Default
`subj_text_or_texts`	`Optional[Union[str, List[str]]]`	Query text (or list) for the subject. Ignored when `subj_vector_or_vectors` is supplied.	`None`
`obj_text_or_texts`	`Optional[Union[str, List[str]]]`	Query text (or list) for the object. Ignored when `obj_vector_or_vectors` is supplied.	`None`
`subj_label`	`str`	Entity label of the subject endpoint.	required
`obj_label`	`str`	Entity label of the object endpoint.	required
`subj_vector_or_vectors`	`Optional[Union[List[float], List[List[float]]]]`	Pre-computed subject query vector(s).	`None`
`obj_vector_or_vectors`	`Optional[Union[List[float], List[List[float]]]]`	Pre-computed object query vector(s).	`None`
`label`	`Optional[str]`	Optional rel-label constraint for every hop.	`None`
`min_hops`	`int`	Minimum hop count, inclusive (default: 1).	`1`
`max_hops`	`int`	Maximum hop count, inclusive (default: 3).	`3`
`k`	`int`	Maximum number of results.	`10`
`k_rank`	`int`	RRF smoothing constant.	`60`
`similarity_threshold`	`Optional[float]`	Optional vector-distance threshold.	`None`
`fulltext_threshold`	`Optional[float]`	Optional BM25 score threshold.	`None`
`ef_search`	`Optional[int]`	HNSW `efs` knob applied to both endpoints.	`None`
`conjunctive`	`bool`	AND vs OR for the BM25 branch.	`False`
`bm25_b`	`Optional[float]`	Optional override for BM25's `b` parameter.	`None`
`output_format`	`str`	`"json"` (default) or `"csv"`.	`'json'`

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def path_hybrid_fts_search(
    self,
    subj_text_or_texts: Optional[Union[str, List[str]]] = None,
    obj_text_or_texts: Optional[Union[str, List[str]]] = None,
    *,
    subj_keywords: Optional[Union[str, List[str]]] = None,
    obj_keywords: Optional[Union[str, List[str]]] = None,
    subj_label: str,
    obj_label: str,
    subj_vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
    obj_vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
    label: Optional[str] = None,
    min_hops: int = 1,
    max_hops: int = 3,
    k: int = 10,
    k_rank: int = 60,
    similarity_threshold: Optional[float] = None,
    fulltext_threshold: Optional[float] = None,
    ef_search: Optional[int] = None,
    conjunctive: bool = False,
    bm25_b: Optional[float] = None,
    output_format: str = "json",
):
    """Hybrid variable-length path search where BOTH endpoints match.

    AND-semantics. Each side is hybrid-searched (vec + fts)
    independently; per matching path the ``rrf_score`` is the
    sum of the subject-side and object-side hybrid scores.
    Falls back to fulltext-only when there are no vectors to
    search with on a side.

    Args:
        subj_text_or_texts: Query text (or list) for the subject.
            Ignored when ``subj_vector_or_vectors`` is supplied.
        obj_text_or_texts: Query text (or list) for the object.
            Ignored when ``obj_vector_or_vectors`` is supplied.
        subj_label: Entity label of the subject endpoint.
        obj_label: Entity label of the object endpoint.
        subj_vector_or_vectors: Pre-computed subject query vector(s).
        obj_vector_or_vectors: Pre-computed object query vector(s).
        label: Optional rel-label constraint for every hop.
        min_hops: Minimum hop count, inclusive (default: 1).
        max_hops: Maximum hop count, inclusive (default: 3).
        k: Maximum number of results.
        k_rank: RRF smoothing constant.
        similarity_threshold: Optional vector-distance threshold.
        fulltext_threshold: Optional BM25 score threshold.
        ef_search: HNSW ``efs`` knob applied to both endpoints.
        conjunctive: AND vs OR for the BM25 branch.
        bm25_b: Optional override for BM25's ``b`` parameter.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.path_hybrid_fts_search(
        subj_text_or_texts=subj_text_or_texts,
        obj_text_or_texts=obj_text_or_texts,
        subj_label=subj_label,
        obj_label=obj_label,
        subj_keywords=subj_keywords,
        obj_keywords=obj_keywords,
        subj_vector_or_vectors=subj_vector_or_vectors,
        obj_vector_or_vectors=obj_vector_or_vectors,
        label=label,
        min_hops=min_hops,
        max_hops=max_hops,
        k=k,
        k_rank=k_rank,
        similarity_threshold=similarity_threshold,
        fulltext_threshold=fulltext_threshold,
        ef_search=ef_search,
        conjunctive=conjunctive,
        bm25_b=bm25_b,
        output_format=output_format,
    )

`path_hybrid_regex_search(subj_text_or_texts=None, obj_text_or_texts=None, *, subj_pattern_or_patterns=None, obj_pattern_or_patterns=None, subj_label, obj_label, subj_vector_or_vectors=None, obj_vector_or_vectors=None, label=None, min_hops=1, max_hops=3, k=10, k_rank=60, similarity_threshold=None, fields=None, case_sensitive=True, output_format='json')` `async`

RRF of vector + regex variable-length path search, AND semantics.

Each side is hybrid-searched (vec + regex) independently; the path's rrf_score is the sum of the two endpoint hybrid scores. Falls through to path_similarity_search when no patterns are supplied. Each side's vector branch can be driven by pre-computed vectors instead of text.

Parameters:

Name	Type	Description	Default
`subj_text_or_texts`	`Optional[Union[str, List[str]]]`	Query text (or list) for the subject vector branch. Ignored when `subj_vector_or_vectors` is supplied.	`None`
`obj_text_or_texts`	`Optional[Union[str, List[str]]]`	Query text (or list) for the object vector branch. Ignored when `obj_vector_or_vectors` is supplied.	`None`
`subj_pattern_or_patterns`	`Optional[Union[str, List[str]]]`	Regex pattern (or list) for the subject.	`None`
`obj_pattern_or_patterns`	`Optional[Union[str, List[str]]]`	Regex pattern (or list) for the object.	`None`
`subj_label`	`str`	Entity label of the subject endpoint.	required
`obj_label`	`str`	Entity label of the object endpoint.	required
`subj_vector_or_vectors`	`Optional[Union[List[float], List[List[float]]]]`	Pre-computed subject query vector(s).	`None`
`obj_vector_or_vectors`	`Optional[Union[List[float], List[List[float]]]]`	Pre-computed object query vector(s).	`None`
`label`	`Optional[str]`	Optional rel-label constraint for every hop.	`None`
`min_hops`	`int`	Minimum hop count, inclusive (default: 1).	`1`
`max_hops`	`int`	Maximum hop count, inclusive (default: 3).	`3`
`k`	`int`	Maximum number of results.	`10`
`k_rank`	`int`	RRF smoothing constant.	`60`
`similarity_threshold`	`Optional[float]`	Optional vector-distance threshold.	`None`
`fields`	`Optional[List[str]]`	Forwarded to the regex branch.	`None`
`case_sensitive`	`bool`	Forwarded to the regex branch.	`True`
`output_format`	`str`	`"json"` (default) or `"csv"`.	`'json'`

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def path_hybrid_regex_search(
    self,
    subj_text_or_texts: Optional[Union[str, List[str]]] = None,
    obj_text_or_texts: Optional[Union[str, List[str]]] = None,
    *,
    subj_pattern_or_patterns: Optional[Union[str, List[str]]] = None,
    obj_pattern_or_patterns: Optional[Union[str, List[str]]] = None,
    subj_label: str,
    obj_label: str,
    subj_vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
    obj_vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
    label: Optional[str] = None,
    min_hops: int = 1,
    max_hops: int = 3,
    k: int = 10,
    k_rank: int = 60,
    similarity_threshold: Optional[float] = None,
    fields: Optional[List[str]] = None,
    case_sensitive: bool = True,
    output_format: str = "json",
):
    """RRF of vector + regex variable-length path search, AND semantics.

    Each side is hybrid-searched (vec + regex) independently; the
    path's ``rrf_score`` is the sum of the two endpoint hybrid
    scores. Falls through to `path_similarity_search` when
    no patterns are supplied. Each side's vector branch can be
    driven by pre-computed vectors instead of text.

    Args:
        subj_text_or_texts: Query text (or list) for the subject
            vector branch. Ignored when ``subj_vector_or_vectors``
            is supplied.
        obj_text_or_texts: Query text (or list) for the object
            vector branch. Ignored when ``obj_vector_or_vectors``
            is supplied.
        subj_pattern_or_patterns: Regex pattern (or list) for the subject.
        obj_pattern_or_patterns: Regex pattern (or list) for the object.
        subj_label: Entity label of the subject endpoint.
        obj_label: Entity label of the object endpoint.
        subj_vector_or_vectors: Pre-computed subject query vector(s).
        obj_vector_or_vectors: Pre-computed object query vector(s).
        label: Optional rel-label constraint for every hop.
        min_hops: Minimum hop count, inclusive (default: 1).
        max_hops: Maximum hop count, inclusive (default: 3).
        k: Maximum number of results.
        k_rank: RRF smoothing constant.
        similarity_threshold: Optional vector-distance threshold.
        fields: Forwarded to the regex branch.
        case_sensitive: Forwarded to the regex branch.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.path_hybrid_regex_search(
        subj_text_or_texts=subj_text_or_texts,
        obj_text_or_texts=obj_text_or_texts,
        subj_pattern_or_patterns=subj_pattern_or_patterns,
        obj_pattern_or_patterns=obj_pattern_or_patterns,
        subj_label=subj_label,
        obj_label=obj_label,
        subj_vector_or_vectors=subj_vector_or_vectors,
        obj_vector_or_vectors=obj_vector_or_vectors,
        label=label,
        min_hops=min_hops,
        max_hops=max_hops,
        k=k,
        k_rank=k_rank,
        similarity_threshold=similarity_threshold,
        fields=fields,
        case_sensitive=case_sensitive,
        output_format=output_format,
    )

`path_regex_search(subj_pattern, obj_pattern, *, subj_label, obj_label, label=None, min_hops=1, max_hops=3, k=10, fields=None, case_sensitive=True, output_format='json')` `async`

Regex variable-length path search, AND semantics.

Both endpoints must match their respective regex pattern. Regex is binary; ranking is by path length (shorter first).

Parameters:

Name	Type	Description	Default
`subj_pattern`	`str`	Regex pattern for the subject endpoint.	required
`obj_pattern`	`str`	Regex pattern for the object endpoint.	required
`subj_label`	`str`	Entity label of the subject endpoint.	required
`obj_label`	`str`	Entity label of the object endpoint.	required
`label`	`Optional[str]`	Optional rel-label constraint for every hop.	`None`
`min_hops`	`int`	Minimum hop count, inclusive (default: 1).	`1`
`max_hops`	`int`	Maximum hop count, inclusive (default: 3).	`3`
`k`	`int`	Maximum number of results.	`10`
`fields`	`Optional[List[str]]`	Optional whitelist of fields, applied to both endpoints.	`None`
`case_sensitive`	`bool`	When `False`, matches case-insensitively.	`True`
`output_format`	`str`	`"json"` (default) or `"csv"`.	`'json'`

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def path_regex_search(
    self,
    subj_pattern: str,
    obj_pattern: str,
    *,
    subj_label: str,
    obj_label: str,
    label: Optional[str] = None,
    min_hops: int = 1,
    max_hops: int = 3,
    k: int = 10,
    fields: Optional[List[str]] = None,
    case_sensitive: bool = True,
    output_format: str = "json",
):
    """Regex variable-length path search, AND semantics.

    Both endpoints must match their respective regex pattern.
    Regex is binary; ranking is by path length (shorter first).

    Args:
        subj_pattern: Regex pattern for the subject endpoint.
        obj_pattern: Regex pattern for the object endpoint.
        subj_label: Entity label of the subject endpoint.
        obj_label: Entity label of the object endpoint.
        label: Optional rel-label constraint for every hop.
        min_hops: Minimum hop count, inclusive (default: 1).
        max_hops: Maximum hop count, inclusive (default: 3).
        k: Maximum number of results.
        fields: Optional whitelist of fields, applied to both endpoints.
        case_sensitive: When ``False``, matches case-insensitively.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.path_regex_search(
        subj_pattern=subj_pattern,
        obj_pattern=obj_pattern,
        subj_label=subj_label,
        obj_label=obj_label,
        label=label,
        min_hops=min_hops,
        max_hops=max_hops,
        k=k,
        fields=fields,
        case_sensitive=case_sensitive,
        output_format=output_format,
    )

`path_similarity_search(subj_text_or_texts=None, obj_text_or_texts=None, *, subj_label, obj_label, subj_vector_or_vectors=None, obj_vector_or_vectors=None, label=None, min_hops=1, max_hops=3, k=10, subj_threshold=None, obj_threshold=None, ef_search=None, output_format='json')` `async`

Variable-length path search where BOTH endpoints match.

Returns paths of min_hops..max_hops edges whose start node is vector-close to the subject query AND whose end node is vector-close to the object query. label is an optional rel-label constraint applied to every hop; when omitted, any edge type is allowed.

Each row carries the full path: nodes (every node along the way, endpoints included), rels (every edge), and length (hop count), alongside the two endpoint distances and flattened endpoint PKs.

Parameters:

Name	Type	Description	Default
`subj_text_or_texts`	`Optional[Union[str, List[str]]]`	Query text (or list) for the subject. Ignored when `subj_vector_or_vectors` is supplied.	`None`
`obj_text_or_texts`	`Optional[Union[str, List[str]]]`	Query text (or list) for the object. Ignored when `obj_vector_or_vectors` is supplied.	`None`
`subj_label`	`str`	Entity label of the subject endpoint.	required
`obj_label`	`str`	Entity label of the object endpoint.	required
`subj_vector_or_vectors`	`Optional[Union[List[float], List[List[float]]]]`	Pre-computed subject query vector(s).	`None`
`obj_vector_or_vectors`	`Optional[Union[List[float], List[List[float]]]]`	Pre-computed object query vector(s).	`None`
`label`	`Optional[str]`	Optional rel-label constraint for every hop.	`None`
`min_hops`	`int`	Minimum hop count, inclusive (default: 1).	`1`
`max_hops`	`int`	Maximum hop count, inclusive (default: 3).	`3`
`k`	`int`	Maximum number of results.	`10`
`subj_threshold`	`Optional[float]`	Optional subject-side distance threshold.	`None`
`obj_threshold`	`Optional[float]`	Optional object-side distance threshold.	`None`
`output_format`	`str`	`"json"` (default) or `"csv"`.	`'json'`

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def path_similarity_search(
    self,
    subj_text_or_texts: Optional[Union[str, List[str]]] = None,
    obj_text_or_texts: Optional[Union[str, List[str]]] = None,
    *,
    subj_label: str,
    obj_label: str,
    subj_vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
    obj_vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
    label: Optional[str] = None,
    min_hops: int = 1,
    max_hops: int = 3,
    k: int = 10,
    subj_threshold: Optional[float] = None,
    obj_threshold: Optional[float] = None,
    ef_search: Optional[int] = None,
    output_format: str = "json",
):
    """Variable-length path search where BOTH endpoints match.

    Returns paths of ``min_hops..max_hops`` edges whose start
    node is vector-close to the subject query AND whose end node
    is vector-close to the object query. ``label`` is an optional
    rel-label constraint applied to every hop; when omitted, any
    edge type is allowed.

    Each row carries the full path: ``nodes`` (every node along
    the way, endpoints included), ``rels`` (every edge), and
    ``length`` (hop count), alongside the two endpoint distances
    and flattened endpoint PKs.

    Args:
        subj_text_or_texts: Query text (or list) for the subject.
            Ignored when ``subj_vector_or_vectors`` is supplied.
        obj_text_or_texts: Query text (or list) for the object.
            Ignored when ``obj_vector_or_vectors`` is supplied.
        subj_label: Entity label of the subject endpoint.
        obj_label: Entity label of the object endpoint.
        subj_vector_or_vectors: Pre-computed subject query vector(s).
        obj_vector_or_vectors: Pre-computed object query vector(s).
        label: Optional rel-label constraint for every hop.
        min_hops: Minimum hop count, inclusive (default: 1).
        max_hops: Maximum hop count, inclusive (default: 3).
        k: Maximum number of results.
        subj_threshold: Optional subject-side distance threshold.
        obj_threshold: Optional object-side distance threshold.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.path_similarity_search(
        subj_text_or_texts,
        obj_text_or_texts,
        subj_label=subj_label,
        obj_label=obj_label,
        subj_vector_or_vectors=subj_vector_or_vectors,
        obj_vector_or_vectors=obj_vector_or_vectors,
        label=label,
        min_hops=min_hops,
        max_hops=max_hops,
        k=k,
        subj_threshold=subj_threshold,
        obj_threshold=obj_threshold,
        ef_search=ef_search,
        output_format=output_format,
    )

`regex_search(pattern, *, table_name, fields=None, case_sensitive=True, k=10, output_format='json')` `async`

Find rows whose string fields match a regular expression.

DuckDB evaluates regexes with RE2, so patterns are linear-time and not vulnerable to catastrophic backtracking.

Parameters:

Name	Type	Description	Default
`pattern`	`str`	The regex pattern (RE2 syntax).	required
`table_name`	`str`	Target table.	required
`fields`	`Optional[List[str]]`	Field names to match against. Defaults to every string field on the schema. Names are snake_case- normalized to match stored column names.	`None`
`case_sensitive`	`bool`	When `False`, match case-insensitively.	`True`
`k`	`int`	Maximum number of results.	`10`
`output_format`	`str`	`"json"` (list of dicts, default) / `"csv"` (text).	`'json'`

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def regex_search(
    self,
    pattern: str,
    *,
    table_name: str,
    fields: Optional[List[str]] = None,
    case_sensitive: bool = True,
    k: int = 10,
    output_format: str = "json",
):
    """Find rows whose string fields match a regular expression.

    DuckDB evaluates regexes with RE2, so patterns are linear-time
    and not vulnerable to catastrophic backtracking.

    Args:
        pattern: The regex pattern (RE2 syntax).
        table_name: Target table.
        fields: Field names to match against. Defaults to every
            string field on the schema. Names are snake_case-
            normalized to match stored column names.
        case_sensitive: When ``False``, match case-insensitively.
        k: Maximum number of results.
        output_format: ``"json"`` (list of dicts, default) / ``"csv"`` (text).
    """
    return await self.sql_adapter.regex_search(
        pattern,
        table_name=table_name,
        fields=fields,
        case_sensitive=case_sensitive,
        k=k,
        output_format=output_format,
    )

`relation_fulltext_search(text_or_texts, *, label, k=10, threshold=None, conjunctive=False, bm25_b=None, output_format='json')` `async`

BM25 fulltext search over relations of a given label.

Per matched edge, the final score is the sum of the subject-side and object-side BM25 scores — either-endpoint union (edge surfaces if either endpoint matched).

Parameters:

Name	Type	Description	Default
`text_or_texts`	`Union[str, List[str]]`	Query text or list of query texts.	required
`label`	`str`	The relation label to search within.	required
`k`	`int`	Maximum number of results.	`10`
`threshold`	`Optional[float]`	Optional minimum BM25 threshold applied per endpoint.	`None`
`conjunctive`	`bool`	AND-mode query (every term must match).	`False`
`bm25_b`	`Optional[float]`	Optional override for BM25's `b` parameter.	`None`
`output_format`	`str`	`"json"` (default) or `"csv"`.	`'json'`

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def relation_fulltext_search(
    self,
    text_or_texts: Union[str, List[str]],
    *,
    label: str,
    k: int = 10,
    threshold: Optional[float] = None,
    conjunctive: bool = False,
    bm25_b: Optional[float] = None,
    output_format: str = "json",
):
    """BM25 fulltext search over relations of a given label.

    Per matched edge, the final ``score`` is the sum of the
    subject-side and object-side BM25 scores — either-endpoint
    union (edge surfaces if either endpoint matched).

    Args:
        text_or_texts: Query text or list of query texts.
        label: The relation label to search within.
        k: Maximum number of results.
        threshold: Optional minimum BM25 threshold applied per endpoint.
        conjunctive: AND-mode query (every term must match).
        bm25_b: Optional override for BM25's ``b`` parameter.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.relation_fulltext_search(
        text_or_texts,
        label=label,
        k=k,
        threshold=threshold,
        conjunctive=conjunctive,
        bm25_b=bm25_b,
        output_format=output_format,
    )

`relation_hybrid_fts_search(text_or_texts=None, *, keywords=None, label, vector_or_vectors=None, k=10, k_rank=60, similarity_threshold=None, fulltext_threshold=None, ef_search=None, conjunctive=False, bm25_b=None, output_format='json')` `async`

RRF of vector + BM25 fulltext over relations of a label.

Either-endpoint union: per matched edge, the final rrf_score is the sum of the subject-side and object-side hybrid scores — equivalent to a 4-source RRF. Falls back to fulltext-only when there are no vectors to search with.

Parameters:

Name	Type	Description	Default
`text_or_texts`	`Optional[Union[str, List[str]]]`	Query text or list of query texts. Ignored when `vector_or_vectors` is supplied.	`None`
`keywords`	`Optional[Union[str, List[str]]]`	Query text(s) for the BM25 branch.	`None`
`label`	`str`	The relation label to search within.	required
`vector_or_vectors`	`Optional[Union[List[float], List[List[float]]]]`	Pre-computed query vector(s) for the vector branch, matched against both endpoints.	`None`
`k`	`int`	Maximum number of results.	`10`
`k_rank`	`int`	RRF smoothing constant.	`60`
`similarity_threshold`	`Optional[float]`	Optional vector-distance threshold.	`None`
`fulltext_threshold`	`Optional[float]`	Optional BM25 score threshold.	`None`
`ef_search`	`Optional[int]`	HNSW `efs` knob for the vector branch.	`None`
`conjunctive`	`bool`	AND vs OR for the BM25 branch.	`False`
`bm25_b`	`Optional[float]`	Optional override for BM25's `b` parameter.	`None`
`output_format`	`str`	`"json"` (default) or `"csv"`.	`'json'`

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def relation_hybrid_fts_search(
    self,
    text_or_texts: Optional[Union[str, List[str]]] = None,
    *,
    keywords: Optional[Union[str, List[str]]] = None,
    label: str,
    vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
    k: int = 10,
    k_rank: int = 60,
    similarity_threshold: Optional[float] = None,
    fulltext_threshold: Optional[float] = None,
    ef_search: Optional[int] = None,
    conjunctive: bool = False,
    bm25_b: Optional[float] = None,
    output_format: str = "json",
):
    """RRF of vector + BM25 fulltext over relations of a label.

    Either-endpoint union: per matched edge, the final
    ``rrf_score`` is the sum of the subject-side and
    object-side hybrid scores — equivalent to a 4-source RRF.
    Falls back to fulltext-only when there are no vectors to
    search with.

    Args:
        text_or_texts: Query text or list of query texts. Ignored
            when ``vector_or_vectors`` is supplied.
        keywords: Query text(s) for the BM25 branch.
        label: The relation label to search within.
        vector_or_vectors: Pre-computed query vector(s) for the
            vector branch, matched against both endpoints.
        k: Maximum number of results.
        k_rank: RRF smoothing constant.
        similarity_threshold: Optional vector-distance threshold.
        fulltext_threshold: Optional BM25 score threshold.
        ef_search: HNSW ``efs`` knob for the vector branch.
        conjunctive: AND vs OR for the BM25 branch.
        bm25_b: Optional override for BM25's ``b`` parameter.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.relation_hybrid_fts_search(
        text_or_texts=text_or_texts,
        label=label,
        keywords=keywords,
        vector_or_vectors=vector_or_vectors,
        k=k,
        k_rank=k_rank,
        similarity_threshold=similarity_threshold,
        fulltext_threshold=fulltext_threshold,
        ef_search=ef_search,
        conjunctive=conjunctive,
        bm25_b=bm25_b,
        output_format=output_format,
    )

`relation_hybrid_regex_search(text_or_texts=None, *, pattern_or_patterns=None, label, vector_or_vectors=None, fields=None, case_sensitive=True, k=10, k_rank=60, similarity_threshold=None, output_format='json')` `async`

RRF of vector similarity + regex match over relations.

Per matched edge, the final rrf_score is the sum of the subject's and the object's hybrid scores — same 4-source-RRF reduction as relation_hybrid_fts_search. Falls through to relation_similarity_search when no patterns are supplied.

Parameters:

Name	Type	Description	Default
`text_or_texts`	`Optional[Union[str, List[str]]]`	Query text or list of query texts for the vector branch. Ignored when `vector_or_vectors` is supplied.	`None`
`pattern_or_patterns`	`Optional[Union[str, List[str]]]`	Regex pattern (or list) for the regex branch.	`None`
`label`	`str`	The relation label.	required
`vector_or_vectors`	`Optional[Union[List[float], List[List[float]]]]`	Pre-computed query vector(s) for the vector branch, matched against both endpoints.	`None`
`fields`	`Optional[List[str]]`	Forwarded to `entity_regex_search`.	`None`
`case_sensitive`	`bool`	Forwarded to `entity_regex_search`.	`True`
`k`	`int`	Maximum number of results.	`10`
`k_rank`	`int`	RRF smoothing constant.	`60`
`similarity_threshold`	`Optional[float]`	Optional vector-distance threshold.	`None`
`output_format`	`str`	`"json"` (default) or `"csv"`.	`'json'`

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def relation_hybrid_regex_search(
    self,
    text_or_texts: Optional[Union[str, List[str]]] = None,
    *,
    pattern_or_patterns: Optional[Union[str, List[str]]] = None,
    label: str,
    vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
    fields: Optional[List[str]] = None,
    case_sensitive: bool = True,
    k: int = 10,
    k_rank: int = 60,
    similarity_threshold: Optional[float] = None,
    output_format: str = "json",
):
    """RRF of vector similarity + regex match over relations.

    Per matched edge, the final ``rrf_score`` is the sum of the
    subject's and the object's hybrid scores — same 4-source-RRF
    reduction as `relation_hybrid_fts_search`. Falls through
    to `relation_similarity_search` when no patterns are
    supplied.

    Args:
        text_or_texts: Query text or list of query texts for the
            vector branch. Ignored when ``vector_or_vectors`` is
            supplied.
        pattern_or_patterns: Regex pattern (or list) for the regex branch.
        label: The relation label.
        vector_or_vectors: Pre-computed query vector(s) for the
            vector branch, matched against both endpoints.
        fields: Forwarded to `entity_regex_search`.
        case_sensitive: Forwarded to `entity_regex_search`.
        k: Maximum number of results.
        k_rank: RRF smoothing constant.
        similarity_threshold: Optional vector-distance threshold.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.relation_hybrid_regex_search(
        text_or_texts=text_or_texts,
        pattern_or_patterns=pattern_or_patterns,
        label=label,
        vector_or_vectors=vector_or_vectors,
        fields=fields,
        case_sensitive=case_sensitive,
        k=k,
        k_rank=k_rank,
        similarity_threshold=similarity_threshold,
        output_format=output_format,
    )

`relation_regex_search(pattern, *, label, fields=None, case_sensitive=True, k=10, output_format='json')` `async`

Regex search over relations of a given label.

Composed via entity_regex_search on each endpoint. Regex hits are binary; the row's score is 2.0 when both endpoints matched and 1.0 when only one did, with matched_on indicating the side(s).

Parameters:

Name	Type	Description	Default
`pattern`	`str`	The regex pattern.	required
`label`	`str`	The relation label to search within.	required
`fields`	`Optional[List[str]]`	Optional whitelist of fields, applied to both endpoints.	`None`
`case_sensitive`	`bool`	When `False`, matches case-insensitively.	`True`
`k`	`int`	Maximum number of rows.	`10`
`output_format`	`str`	`"json"` (default) or `"csv"`.	`'json'`

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def relation_regex_search(
    self,
    pattern: str,
    *,
    label: str,
    fields: Optional[List[str]] = None,
    case_sensitive: bool = True,
    k: int = 10,
    output_format: str = "json",
):
    """Regex search over relations of a given label.

    Composed via `entity_regex_search` on each endpoint.
    Regex hits are binary; the row's ``score`` is 2.0 when both
    endpoints matched and 1.0 when only one did, with
    ``matched_on`` indicating the side(s).

    Args:
        pattern: The regex pattern.
        label: The relation label to search within.
        fields: Optional whitelist of fields, applied to both endpoints.
        case_sensitive: When ``False``, matches case-insensitively.
        k: Maximum number of rows.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.relation_regex_search(
        pattern,
        label=label,
        fields=fields,
        case_sensitive=case_sensitive,
        k=k,
        output_format=output_format,
    )

`relation_similarity_search(text_or_texts=None, *, label, vector_or_vectors=None, k=10, threshold=None, ef_search=None, output_format='json')` `async`

Vector similarity search over relations of a given label.

The query matches against BOTH endpoints (subject and object); the adapter returns one row per matched edge with its best (lowest) distance and a matched_on tag ("subj", "obj", or "both").

Parameters:

Name	Type	Description	Default
`text_or_texts`	`Optional[Union[str, List[str]]]`	Query text or list of query texts. Ignored when `vector_or_vectors` is supplied.	`None`
`label`	`str`	The relation label to search within.	required
`vector_or_vectors`	`Optional[Union[List[float], List[List[float]]]]`	Pre-computed query vector or list of vectors to search with directly (matched against both endpoints).	`None`
`k`	`int`	Maximum number of results.	`10`
`threshold`	`Optional[float]`	Optional vector-distance threshold per endpoint.	`None`
`ef_search`	`Optional[int]`	HNSW `efs` knob applied to both endpoint vector searches.	`None`
`output_format`	`str`	`"json"` (default) or `"csv"`.	`'json'`

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def relation_similarity_search(
    self,
    text_or_texts: Optional[Union[str, List[str]]] = None,
    *,
    label: str,
    vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
    k: int = 10,
    threshold: Optional[float] = None,
    ef_search: Optional[int] = None,
    output_format: str = "json",
):
    """Vector similarity search over relations of a given label.

    The query matches against BOTH endpoints (subject and
    object); the adapter returns one row per matched edge with
    its best (lowest) distance and a ``matched_on`` tag
    (``"subj"``, ``"obj"``, or ``"both"``).

    Args:
        text_or_texts: Query text or list of query texts. Ignored
            when ``vector_or_vectors`` is supplied.
        label: The relation label to search within.
        vector_or_vectors: Pre-computed query vector or list of
            vectors to search with directly (matched against both
            endpoints).
        k: Maximum number of results.
        threshold: Optional vector-distance threshold per endpoint.
        ef_search: HNSW ``efs`` knob applied to both endpoint
            vector searches.
        output_format: ``"json"`` (default) or ``"csv"``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.relation_similarity_search(
        text_or_texts,
        label=label,
        vector_or_vectors=vector_or_vectors,
        k=k,
        threshold=threshold,
        ef_search=ef_search,
        output_format=output_format,
    )

`rename(source, *, table_name=None, table_description=None)` `async`

Rename a table and/or update its description.

Pass at least one of table_name / table_description. When table_name is given the underlying table is renamed via ALTER TABLE …, the FTS / vector indexes are rebuilt under the new name, and the adapter's known-models list is updated so subsequent default-table searches find the table under its new identity.

Parameters:

Name	Type	Description	Default
`source`	`Any`	`SymbolicDataModel` or table-name string for the table to rename. The string form is itself PascalCase-normalized, so callers can pass the same input they used in `from_csv` (e.g. `"my-docs"`).	required
`table_name`	`Optional[str]`	New table name. Always normalized to PascalCase.	`None`
`table_description`	`Optional[str]`	Optional natural-language description attached to the resulting schema.	`None`

Returns:

Type	Description
`Any`	A fresh `SymbolicDataModel` for the (possibly
`Any`	renamed) table.

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def rename(
    self,
    source: Any,
    *,
    table_name: Optional[str] = None,
    table_description: Optional[str] = None,
) -> Any:
    """Rename a table and/or update its description.

    Pass at least one of ``table_name`` / ``table_description``.
    When ``table_name`` is given the underlying table is
    renamed via ``ALTER TABLE …``, the FTS / vector indexes are
    rebuilt under the new name, and the adapter's known-models
    list is updated so subsequent default-table searches find
    the table under its new identity.

    Args:
        source: ``SymbolicDataModel`` or table-name string for
            the table to rename. The string form is itself
            PascalCase-normalized, so callers can pass the
            same input they used in `from_csv` (e.g.
            ``"my-docs"``).
        table_name: New table name. Always normalized to
            PascalCase.
        table_description: Optional natural-language description
            attached to the resulting schema.

    Returns:
        A fresh `SymbolicDataModel` for the (possibly
        renamed) table.
    """
    return await self.sql_adapter.rename(
        source,
        table_name=table_name,
        table_description=table_description,
    )

`similarity_search(text_or_texts=None, *, table_name, vector_or_vectors=None, k=10, threshold=None, ef_search=None, output_format='json')` `async`

Vector similarity search against a single table.

Parameters:

Name	Type	Description	Default
`text_or_texts`	`Optional[Union[str, List[str]]]`	Query text or list of query texts. Ignored when `vector_or_vectors` is supplied.	`None`
`table_name`	`str`	Target table (single-table search).	required
`vector_or_vectors`	`Optional[Union[List[float], List[List[float]]]]`	A pre-computed query vector, or a list of vectors, to search with directly instead of embedding `text_or_texts`. When supplied, no embedding model is required on the knowledge base.	`None`
`k`	`int`	Maximum number of results to return.	`10`
`threshold`	`Optional[float]`	Optional maximum vector-distance threshold.	`None`
`ef_search`	`Optional[int]`	HNSW search-time candidate-list depth. `None` keeps the index-time value (or the engine default). Higher = better recall, slower query.	`None`
`output_format`	`str`	`"json"` (default, list of dicts — JSON-shaped Python data) or `"csv"` (CSV string, useful for handing results to an LM since CSV is ~30-50% fewer tokens than equivalent JSON).	`'json'`

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def similarity_search(
    self,
    text_or_texts: Optional[Union[str, List[str]]] = None,
    *,
    table_name: str,
    vector_or_vectors: Optional[Union[List[float], List[List[float]]]] = None,
    k: int = 10,
    threshold: Optional[float] = None,
    ef_search: Optional[int] = None,
    output_format: str = "json",
):
    """Vector similarity search against a single table.

    Args:
        text_or_texts: Query text or list of query texts. Ignored
            when ``vector_or_vectors`` is supplied.
        table_name: Target table (single-table search).
        vector_or_vectors: A pre-computed query vector, or a list of
            vectors, to search with directly instead of embedding
            ``text_or_texts``. When supplied, no embedding model is
            required on the knowledge base.
        k: Maximum number of results to return.
        threshold: Optional maximum vector-distance threshold.
        ef_search: HNSW search-time candidate-list depth.
            ``None`` keeps the index-time value (or the engine
            default). Higher = better recall, slower query.
        output_format: ``"json"`` (default, list of dicts —
            JSON-shaped Python data) or ``"csv"`` (CSV string,
            useful for handing results to an LM since CSV is
            ~30-50% fewer tokens than equivalent JSON).
    """
    return await self.sql_adapter.similarity_search(
        text_or_texts,
        table_name=table_name,
        vector_or_vectors=vector_or_vectors,
        k=k,
        threshold=threshold,
        ef_search=ef_search,
        output_format=output_format,
    )

`sql(sql, *, params=None, output_format='json', **kwargs)` `async`

Execute a raw SQL query against the knowledge base.

Counterpart of cypher — the method is named after the query language so a dual-adapter KnowledgeBase has a clear per-language entry point.

Parameters:

Name	Type	Description	Default
`sql`	`str`	The SQL string to execute.	required
`params`	`dict`	Optional list of parameters for parameterized queries.	`None`
`output_format`	`str`	`"json"` (default, list of dicts — JSON-shaped Python data) or `"csv"` (CSV string, useful when handing the result to an LM).	`'json'`
`**kwargs`	`Any`	Additional options. The most important one is `read_only=True/False`. When `True` (the DuckDB adapter's default) two layers of defence apply: The SQL is parsed with the engine's own parser and any non-`SELECT` statement is rejected. This catches multi-statement injection (e.g. `SELECT 1; DROP TABLE x`), `COPY ... TO 'file'` exfiltration, `ATTACH`, `EXPORT`, and other side-effecting statements. This is the only layer that blocks writes — the adapter's underlying connection is read-write (one connection per adapter, reused across operations), so the parser check is what keeps untrusted SQL read-only. `enable_external_access` is disabled on that connection at construction time, so `SELECT` table functions that touch the host filesystem or network — `read_csv`, `read_parquet`, `read_json`, `read_blob`, `read_text`, `glob` and the httpfs/S3 variants — return a permission error instead of leaking files. Without this layer, `SELECT * FROM read_csv('/etc/passwd', ...)` would pass defence (1) because it is a syntactically valid `SELECT`. Pass `read_only=False` only from trusted call sites that genuinely need to mutate state. Those paths still run on the same sandboxed connection (no external I/O), but they bypass the parser check, so any SQL is accepted — keep them out of the LM-tool-call surface.	`{}`

Returns:

Type	Description
`Union[List[Dict[str, Any]], str]`	A list of dicts when `output_format="json"`, or a CSV string when `output_format="csv"`.

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def sql(
    self,
    sql: str,
    *,
    params: Optional[Dict[str, Any]] = None,
    output_format: str = "json",
    **kwargs,
) -> Union[List[Dict[str, Any]], str]:
    """Execute a raw SQL query against the knowledge base.

    Counterpart of `cypher` — the method is named after the
    query language so a dual-adapter KnowledgeBase has a clear
    per-language entry point.

    Args:
        sql (str): The SQL string to execute.
        params (dict): Optional list of parameters for parameterized queries.
        output_format: ``"json"`` (default, list of dicts —
            JSON-shaped Python data) or ``"csv"`` (CSV string,
            useful when handing the result to an LM).
        **kwargs (Any): Additional options. The most important one is
            ``read_only=True/False``. When ``True`` (the DuckDB adapter's
            default) two layers of defence apply:

            1. The SQL is parsed with the engine's own parser and any
               non-``SELECT`` statement is rejected. This catches
               multi-statement injection (e.g. ``SELECT 1; DROP TABLE x``),
               ``COPY ... TO 'file'`` exfiltration, ``ATTACH``, ``EXPORT``,
               and other side-effecting statements. This is the only
               layer that blocks writes — the adapter's underlying
               connection is read-write (one connection per adapter,
               reused across operations), so the parser check is what
               keeps untrusted SQL read-only.
            2. ``enable_external_access`` is disabled on that connection
               at construction time, so ``SELECT`` table functions that
               touch the host filesystem or network — ``read_csv``,
               ``read_parquet``, ``read_json``, ``read_blob``,
               ``read_text``, ``glob`` and the httpfs/S3 variants —
               return a permission error instead of leaking files.
               Without this layer,
               ``SELECT * FROM read_csv('/etc/passwd', ...)`` would pass
               defence (1) because it is a syntactically valid ``SELECT``.

            Pass ``read_only=False`` only from trusted call sites that
            genuinely need to mutate state. Those paths still run on
            the same sandboxed connection (no external I/O), but they
            bypass the parser check, so any SQL is accepted — keep them
            out of the LM-tool-call surface.

    Returns:
        (Union[List[Dict[str, Any]], str]): A list of dicts when
            ``output_format="json"``, or a CSV string when
            ``output_format="csv"``.
    """
    return await self.sql_adapter.sql(
        sql, params=params, output_format=output_format, **kwargs
    )

`update(data_model_or_data_models, *, verbose='auto')` `async`

Insert or update records in the knowledge base.

Parameters:

Name	Type	Description	Default
`data_model_or_data_models`	`JsonDataModel \| List[JsonDataModel] \| Dataset`	A single `JsonDataModel`, a list of `JsonDataModel` / `DataModel` instances, or a synalinks `Dataset`. The `Dataset` form streams the source batch-by-batch (one `adapter.update` call per yielded batch) so memory stays bounded for large CSV / Parquet / HuggingFace sources. The dataset must be inputs-only — no `output_template` — because the knowledge base stores records, not `(input, target)` pairs; pass a labeled dataset and you'll get a `ValueError`. Upserts key off the first declared field of the model — see the "Primary Key Convention" section on the class docstring for how that's resolved (and why no UUID is injected).	required
`verbose`	`int \| str`	`"auto"`, `0`, `1`, or `2`. Verbosity for the `Dataset` path; matches the trainer's `fit()` semantics. `"auto"` (default) resolves to `1` when a `Dataset` is passed (a per-batch progress bar — same widget `fit()` uses, with ETA when `len(dataset)` is known) and is a no-op for the scalar / list forms, which finish in a single adapter call.	`'auto'`

Returns:

Type	Description
`Union[Any, List[Any]]`	The primary key value(s) of the inserted/updated records.
`Union[Any, List[Any]]`	Scalar in / scalar out; list in / list out; `Dataset` in /
`Union[Any, List[Any]]`	flat list of every batch's ids concatenated.

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def update(
    self,
    data_model_or_data_models: Union[Any, List[Any], Dataset],
    *,
    verbose="auto",
) -> Union[Any, List[Any]]:
    """Insert or update records in the knowledge base.

    Args:
        data_model_or_data_models (JsonDataModel | List[JsonDataModel] | Dataset):
            A single ``JsonDataModel``, a list of ``JsonDataModel`` /
            ``DataModel`` instances, or a synalinks ``Dataset``.
            The ``Dataset`` form streams the source batch-by-batch
            (one ``adapter.update`` call per yielded batch) so memory
            stays bounded for large CSV / Parquet / HuggingFace
            sources. The dataset must be inputs-only — no
            ``output_template`` — because the knowledge base stores
            records, not ``(input, target)`` pairs; pass a
            labeled dataset and you'll get a ``ValueError``.

            Upserts key off the first declared field of the model —
            see the "Primary Key Convention" section on the class
            docstring for how that's resolved (and why no UUID is
            injected).
        verbose (int | str): ``"auto"``, ``0``, ``1``, or ``2``.
            Verbosity for the ``Dataset`` path; matches the
            trainer's ``fit()`` semantics. ``"auto"`` (default)
            resolves to ``1`` when a ``Dataset`` is passed (a
            per-batch progress bar — same widget ``fit()`` uses,
            with ETA when ``len(dataset)`` is known) and is a
            no-op for the scalar / list forms, which finish in a
            single adapter call.

    Returns:
        The primary key value(s) of the inserted/updated records.
        Scalar in / scalar out; list in / list out; ``Dataset`` in /
        flat list of every batch's ids concatenated.
    """
    if isinstance(data_model_or_data_models, Dataset):
        return await self._update_from_dataset(
            data_model_or_data_models, verbose=verbose
        )
    return await self.sql_adapter.update(data_model_or_data_models)

`update_entities(entity_or_entities)` `async`

Insert or update one or more entities (nodes) in the graph.

Graph-side counterpart of the SQL update. The name mirrors the Entities data model; pass either a single Entity or a list — the return shape matches the input.

Parameters:

Name	Type	Description	Default
`entity_or_entities`	`Union[Any, List[Any]]`	An `Entity` instance, or a list of them (or anything satisfying `is_entity`).	required

Returns:

Type	Description
`Union[Any, List[Any]]`	The node id(s) assigned by the backend. Scalar in / scalar
`Union[Any, List[Any]]`	out; list in / list out.

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def update_entities(
    self,
    entity_or_entities: Union[Any, List[Any]],
) -> Union[Any, List[Any]]:
    """Insert or update one or more entities (nodes) in the graph.

    Graph-side counterpart of the SQL `update`. The name
    mirrors the `Entities` data model; pass either a single
    ``Entity`` or a list — the return shape matches the input.

    Args:
        entity_or_entities: An ``Entity`` instance, or a list of
            them (or anything satisfying ``is_entity``).

    Returns:
        The node id(s) assigned by the backend. Scalar in / scalar
        out; list in / list out.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.update_entities(entity_or_entities)

`update_knowledge_graph(knowledge_graph)` `async`

Bulk-insert a full knowledge graph (entities + relations).

Equivalent to calling update_entities then update_relations, but concrete adapters may optimize the combined path.

Parameters:

Name	Type	Description	Default
`knowledge_graph`	`Any`	A `KnowledgeGraph` instance.	required

Returns:

Type	Description
`Any`	A dict with ``{"entities": [...ids...], "relations":
`Any`	[...ids...]}``.

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def update_knowledge_graph(self, knowledge_graph: Any) -> Any:
    """Bulk-insert a full knowledge graph (entities + relations).

    Equivalent to calling `update_entities` then
    `update_relations`, but concrete adapters may optimize
    the combined path.

    Args:
        knowledge_graph: A ``KnowledgeGraph`` instance.

    Returns:
        A dict with ``{"entities": [...ids...], "relations":
        [...ids...]}``.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.update_knowledge_graph(knowledge_graph)

`update_relations(relation_or_relations)` `async`

Insert or update one or more relations (edges) in the graph.

Mirrors the Relations data model. Each relation's subj and obj are upserted as needed so every edge has both endpoints.

Parameters:

Name	Type	Description	Default
`relation_or_relations`	`Union[Any, List[Any]]`	A `Relation` instance, or a list of them (or anything satisfying `is_relation`).	required

Returns:

Type	Description
`Union[Any, List[Any]]`	The edge id(s) assigned by the backend. Scalar in / scalar
`Union[Any, List[Any]]`	out; list in / list out.

Source code in synalinks/src/knowledge_bases/knowledge_base.py

async def update_relations(
    self,
    relation_or_relations: Union[Any, List[Any]],
) -> Union[Any, List[Any]]:
    """Insert or update one or more relations (edges) in the graph.

    Mirrors the `Relations` data model. Each relation's
    ``subj`` and ``obj`` are upserted as needed so every edge has
    both endpoints.

    Args:
        relation_or_relations: A ``Relation`` instance, or a list
            of them (or anything satisfying ``is_relation``).

    Returns:
        The edge id(s) assigned by the backend. Scalar in / scalar
        out; list in / list out.
    """
    self._require_graph_adapter()
    return await self.graph_adapter.update_relations(relation_or_relations)

Knowledge Base

KnowledgeBase

Basic Usage

Primary Key Convention

With Vector Similarity Search

Retrieving Table Definitions

build_communities(*, algorithm='louvain', node_labels=None, rel_labels=None, max_iterations=None, with_pagerank=True, damping_factor=0.85) async

cypher(query, *, params=None, output_format='json', **kwargs) async

delete(id_or_ids, *, table_name) async

delete_entity(id_or_ids, *, label) async

delete_relation(*, label, source_id, target_id) async

detect_communities(*, algorithm='louvain', node_labels=None, rel_labels=None, max_iterations=None) async

drop_table(table_name) async

entity_fulltext_search(text_or_texts, *, label, k=10, threshold=None, conjunctive=False, bm25_b=None, output_format='json') async

entity_hybrid_fts_search(text_or_texts=None, *, keywords=None, label, vector_or_vectors=None, k=10, k_rank=60, similarity_threshold=None, fulltext_threshold=None, ef_search=None, conjunctive=False, bm25_b=None, output_format='json') async

entity_hybrid_regex_search(text_or_texts=None, *, pattern_or_patterns=None, label, vector_or_vectors=None, fields=None, case_sensitive=True, k=10, k_rank=60, similarity_threshold=None, output_format='json') async

entity_regex_search(pattern, *, label, fields=None, case_sensitive=True, k=10, output_format='json') async

entity_similarity_search(text_or_texts=None, *, label, vector_or_vectors=None, k=10, threshold=None, ef_search=None, output_format='json') async

from_csv(path, *, table_name=None, table_description=None, delimiter=',', encoding='utf-8', header=True) async

from_json(path, *, table_name=None, table_description=None) async

from_jsonl(path, *, table_name=None, table_description=None) async

from_parquet(path, *, table_name=None, table_description=None) async

fulltext_search(text_or_texts, *, table_name, k=10, threshold=None, conjunctive=False, bm25_b=None, bm25_k=None, output_format='json') async

get(id_or_ids, *, table_name) async

get_entity(id_or_ids, *, label) async

get_symbolic_data_models()

get_symbolic_entities()

get_symbolic_relations()

getall(*, table_name, limit=50, offset=0) async

global_graph_search(*, node_labels=None, k=10, members_per_community=10, output_format='json') async

hybrid_fts_search(text_or_texts=None, *, keywords=None, table_name, vector_or_vectors=None, k=10, k_rank=60, similarity_threshold=None, fulltext_threshold=None, ef_search=None, conjunctive=False, bm25_b=None, bm25_k=None, output_format='json') async

hybrid_regex_search(text_or_texts=None, *, pattern_or_patterns=None, table_name, vector_or_vectors=None, k=10, k_rank=60, similarity_threshold=None, ef_search=None, fields=None, case_sensitive=True, output_format='json') async

hybrid_search(*args, **kwargs) async

local_graph_search(text_or_texts=None, *, label, vector_or_vectors=None, max_hops=2, k=10, threshold=None, rel_label=None, ef_search=None) async

pagerank(*, node_labels=None, rel_labels=None, damping_factor=0.85, max_iterations=100, tolerance=None, normalize_initial=None, k=None, output_format='json') async

path_fulltext_search(subj_text_or_texts, obj_text_or_texts, *, subj_label, obj_label, label=None, min_hops=1, max_hops=3, k=10, threshold=None, conjunctive=False, bm25_b=None, output_format='json') async

path_regex_search(subj_pattern, obj_pattern, *, subj_label, obj_label, label=None, min_hops=1, max_hops=3, k=10, fields=None, case_sensitive=True, output_format='json') async

path_similarity_search(subj_text_or_texts=None, obj_text_or_texts=None, *, subj_label, obj_label, subj_vector_or_vectors=None, obj_vector_or_vectors=None, label=None, min_hops=1, max_hops=3, k=10, subj_threshold=None, obj_threshold=None, ef_search=None, output_format='json') async

regex_search(pattern, *, table_name, fields=None, case_sensitive=True, k=10, output_format='json') async

relation_fulltext_search(text_or_texts, *, label, k=10, threshold=None, conjunctive=False, bm25_b=None, output_format='json') async

relation_hybrid_fts_search(text_or_texts=None, *, keywords=None, label, vector_or_vectors=None, k=10, k_rank=60, similarity_threshold=None, fulltext_threshold=None, ef_search=None, conjunctive=False, bm25_b=None, output_format='json') async

relation_hybrid_regex_search(text_or_texts=None, *, pattern_or_patterns=None, label, vector_or_vectors=None, fields=None, case_sensitive=True, k=10, k_rank=60, similarity_threshold=None, output_format='json') async

relation_regex_search(pattern, *, label, fields=None, case_sensitive=True, k=10, output_format='json') async

relation_similarity_search(text_or_texts=None, *, label, vector_or_vectors=None, k=10, threshold=None, ef_search=None, output_format='json') async

rename(source, *, table_name=None, table_description=None) async

similarity_search(text_or_texts=None, *, table_name, vector_or_vectors=None, k=10, threshold=None, ef_search=None, output_format='json') async

sql(sql, *, params=None, output_format='json', **kwargs) async

update(data_model_or_data_models, *, verbose='auto') async

update_entities(entity_or_entities) async

update_knowledge_graph(knowledge_graph) async

update_relations(relation_or_relations) async

`KnowledgeBase`

`build_communities(*, algorithm='louvain', node_labels=None, rel_labels=None, max_iterations=None, with_pagerank=True, damping_factor=0.85)` `async`

`cypher(query, *, params=None, output_format='json', **kwargs)` `async`

`delete(id_or_ids, *, table_name)` `async`

`delete_entity(id_or_ids, *, label)` `async`

`delete_relation(*, label, source_id, target_id)` `async`

`detect_communities(*, algorithm='louvain', node_labels=None, rel_labels=None, max_iterations=None)` `async`

`drop_table(table_name)` `async`

`entity_fulltext_search(text_or_texts, *, label, k=10, threshold=None, conjunctive=False, bm25_b=None, output_format='json')` `async`

`entity_hybrid_fts_search(text_or_texts=None, *, keywords=None, label, vector_or_vectors=None, k=10, k_rank=60, similarity_threshold=None, fulltext_threshold=None, ef_search=None, conjunctive=False, bm25_b=None, output_format='json')` `async`

`entity_hybrid_regex_search(text_or_texts=None, *, pattern_or_patterns=None, label, vector_or_vectors=None, fields=None, case_sensitive=True, k=10, k_rank=60, similarity_threshold=None, output_format='json')` `async`

`entity_regex_search(pattern, *, label, fields=None, case_sensitive=True, k=10, output_format='json')` `async`

`entity_similarity_search(text_or_texts=None, *, label, vector_or_vectors=None, k=10, threshold=None, ef_search=None, output_format='json')` `async`

`from_csv(path, *, table_name=None, table_description=None, delimiter=',', encoding='utf-8', header=True)` `async`

`from_json(path, *, table_name=None, table_description=None)` `async`

`from_jsonl(path, *, table_name=None, table_description=None)` `async`

`from_parquet(path, *, table_name=None, table_description=None)` `async`

`fulltext_search(text_or_texts, *, table_name, k=10, threshold=None, conjunctive=False, bm25_b=None, bm25_k=None, output_format='json')` `async`

`get(id_or_ids, *, table_name)` `async`

`get_entity(id_or_ids, *, label)` `async`

`get_symbolic_data_models()`

`get_symbolic_entities()`

`get_symbolic_relations()`

`getall(*, table_name, limit=50, offset=0)` `async`

`global_graph_search(*, node_labels=None, k=10, members_per_community=10, output_format='json')` `async`

`hybrid_fts_search(text_or_texts=None, *, keywords=None, table_name, vector_or_vectors=None, k=10, k_rank=60, similarity_threshold=None, fulltext_threshold=None, ef_search=None, conjunctive=False, bm25_b=None, bm25_k=None, output_format='json')` `async`

`hybrid_regex_search(text_or_texts=None, *, pattern_or_patterns=None, table_name, vector_or_vectors=None, k=10, k_rank=60, similarity_threshold=None, ef_search=None, fields=None, case_sensitive=True, output_format='json')` `async`

`hybrid_search(*args, **kwargs)` `async`

`local_graph_search(text_or_texts=None, *, label, vector_or_vectors=None, max_hops=2, k=10, threshold=None, rel_label=None, ef_search=None)` `async`

`pagerank(*, node_labels=None, rel_labels=None, damping_factor=0.85, max_iterations=100, tolerance=None, normalize_initial=None, k=None, output_format='json')` `async`

`path_fulltext_search(subj_text_or_texts, obj_text_or_texts, *, subj_label, obj_label, label=None, min_hops=1, max_hops=3, k=10, threshold=None, conjunctive=False, bm25_b=None, output_format='json')` `async`

`path_regex_search(subj_pattern, obj_pattern, *, subj_label, obj_label, label=None, min_hops=1, max_hops=3, k=10, fields=None, case_sensitive=True, output_format='json')` `async`

`path_similarity_search(subj_text_or_texts=None, obj_text_or_texts=None, *, subj_label, obj_label, subj_vector_or_vectors=None, obj_vector_or_vectors=None, label=None, min_hops=1, max_hops=3, k=10, subj_threshold=None, obj_threshold=None, ef_search=None, output_format='json')` `async`

`regex_search(pattern, *, table_name, fields=None, case_sensitive=True, k=10, output_format='json')` `async`

`relation_fulltext_search(text_or_texts, *, label, k=10, threshold=None, conjunctive=False, bm25_b=None, output_format='json')` `async`

`relation_hybrid_fts_search(text_or_texts=None, *, keywords=None, label, vector_or_vectors=None, k=10, k_rank=60, similarity_threshold=None, fulltext_threshold=None, ef_search=None, conjunctive=False, bm25_b=None, output_format='json')` `async`

`relation_hybrid_regex_search(text_or_texts=None, *, pattern_or_patterns=None, label, vector_or_vectors=None, fields=None, case_sensitive=True, k=10, k_rank=60, similarity_threshold=None, output_format='json')` `async`

`relation_regex_search(pattern, *, label, fields=None, case_sensitive=True, k=10, output_format='json')` `async`

`relation_similarity_search(text_or_texts=None, *, label, vector_or_vectors=None, k=10, threshold=None, ef_search=None, output_format='json')` `async`

`rename(source, *, table_name=None, table_description=None)` `async`

`similarity_search(text_or_texts=None, *, table_name, vector_or_vectors=None, k=10, threshold=None, ef_search=None, output_format='json')` `async`

`sql(sql, *, params=None, output_format='json', **kwargs)` `async`

`update(data_model_or_data_models, *, verbose='auto')` `async`

`update_entities(entity_or_entities)` `async`

`update_knowledge_graph(knowledge_graph)` `async`

`update_relations(relation_or_relations)` `async`